Primary Nodes¶
The following fields appear in the output of db.getReplicationInfo() for primary nodes.
- errmsg¶
Returns the last error status.
MongoDB is a document-oriented database management system designed for performance, horizontal scalability, high availability, and advanced queryability. See the following wiki pages for more information about MongoDB:
If you want to download MongoDB, see the downloads page.
If you’d like to learn how to use MongoDB with your programming language of choice, see the introduction to the drivers.
The MongoDB documentation project provides a complete manual for the MongoDB database. This resource is replacing eventually replace MongoDB’s original documentation.
This manual is licensed under a Creative Commons “Attribution-NonCommercial-ShareAlike 3.0 Unported” (i.e. “CC-BY-NC-SA”) license.
The MongoDB Manual is copyright © 2011-2012 10gen, Inc.
In addition to the <http://docs.mongodb.org/manual/> site, you can also access this content in the following editions provided for your convenience:
PDF files that provide access to subsets of the MongoDB Manual:
For Emacs users Info/Texinfo users, the following experimental Texinfo manuals are available for offline use:
Important
The texinfo manuals are experimental. If you find an issue with one of these editions, please file an issue in the DOCS Jira project.
This version of the manual reflects version 2.2.2 of MongoDB.
See the MongoDB Documentation Project Page for an overview of all editions and output formats of the MongoDB Manual. You can see the full revision history and track ongoing improvements and additions for all versions of the manual from its GitHub repository.
This edition reflects “master” branch of the documentation as of the “312d6123917eff61bb399f14104636314934389d” revision. This branch is explicitly accessible via “http://docs.mongodb.org/master” and you can always reference the commit of the current manual in the release.txt file.
The most up-to-date, current, and stable version of the manual is always available at “http://docs.mongodb.org/manual/.”
The entire source of the documentation is available in the docs repository along with all of the other MongoDB project repositories on GitHub. You can clone the repository by issuing the following command at your system shell:
git clone git://github.com/mongodb/docs.git
If you have a GitHub account and want to fork this repository, you may issue pull requests, and someone on the documentation team will merge in your contributions promptly. In order to accept your changes to the Manual, you have to complete the MongoDB/10gen Contributor Agreement.
This project tracks issues at MongoDB’s DOCS project. If you see a problem with the documentation, please report it there.
The MongoDB Manual uses Sphinx, a sophisticated documentation engine built upon Python Docutils. The original reStructured Text files, as well as all necessary Sphinx extensions and build tools, are available in the same repository as the documentation.
You can view the documentation style guide and the build instructions in reStructured Text files in the top-level of the documentation repository. If you have any questions, please feel free to open a Jira Case.
MongoDB runs on most platforms, and supports 32-bit and 64-bit architectures. 10gen, the MongoDB makers, provides both binaries and packages. Choose your platform below:
This tutorial outlines the basic installation process for deploying MongoDB on Red Hat Enterprise Linux, CentOS Linux, Fedora Linux and related systems. This procedure uses .rpm packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .rpm packages for easy installation and management for users of Debian systems. While some of these distributions include their own MongoDB packages, the 10gen packages are generally more up to date.
This tutorial includes: an overview of the available packages, instructions for configuring the package manager, the process install packages from the 10gen repository, and preliminary MongoDB configuration and operation.
See also
The documentation of following related processes and concepts.
Other installation tutorials:
The 10gen repository contains four packages:
mongo-10gen
This package contains MongoDB tools from latest stable release. Install this package on all production MongoDB hosts and optionally on other systems from which you may need to administer MongoDB systems.
mongo-10gen-server
This package contains the mongod and mongos daemons from the latest stable release and associated configuration and init scripts.
mongo18-10gen
This package contains MongoDB tools from previous release. Install this package on all production MongoDB hosts and optionally on other systems from which you may need to administer MongoDB systems.
mongo18-10gen-server
This package contains the mongod and mongos daemons from previous stable release and associated configuration and init scripts.
The MongoDB tools included in the mongo-10gen packages are:
Create a /etc/yum.repos.d/10gen.repo file to hold information about your repository. If you are running a 64-bit system (recommended,) place the following configuration in /etc/yum.repos.d/10gen.repo file:
[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
gpgcheck=0
enabled=1
If you are running a 32-bit system, which isn’t recommended for production deployments, place the following configuration in /etc/yum.repos.d/10gen.repo file:
[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
gpgcheck=0
enabled=1
Issue the following command (as root or with sudo) to install the latest stable version of MongoDB and the associated tools:
yum install mongo-10gen mongo-10gen-server
When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.
These packages configure MongoDB using the /etc/mongod.conf file in conjunction with the control script. You can find the init script at /etc/rc.d/init.d/mongod.
This MongoDB instance will store its data files in the /var/lib/mongo and its log files in /var/log/mongo, and run using the mongod user account.
Note
If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongo and /var/log/mongo directories.
Warning
With the introduction of systemd in Fedora 15, the control scripts included in the packages available in the 10gen repository are not compatible with Fedora systems. A correction is forthcoming, see SERVER-7285 for more information, and in the mean time use your own control scripts or install using the procedure outlined in Install MongoDB on Linux.
Start the mongod process by issuing the following command (as root, or with sudo):
service mongod start
You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongo/mongod.log.
You may optionally, ensure that MongoDB will start following a system reboot, by issuing the following command (with root privileges:)
chkconfig mongod on
Stop the mongod process by issuing the following command (as root, or with sudo):
service mongod stop
You can restart the mongod process by issuing the following command (as root, or with sudo):
service mongod restart
Follow the state of this process by watching the output in the /var/log/mongo/mongod.log file to watch for errors or important messages from the server.
You can restart the mongod process by issuing the following command (as root, or with sudo):
As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.
You must SELinux to allow MongoDB to start on Fedora systems. Administrators have two options:
Among the tools included in the mongo-10gen package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that document.
> db.test.save( { a: 1 } )
> db.test.find()
See also
“mongo” and “JavaScript Interface“
This tutorial outlines the basic installation process for installing MongoDB on Ubuntu Linux systems. This tutorial uses .deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages for easy installation and management for users of Ubuntu systems. Ubuntu does include MongoDB packages, the 10gen packages are generally more up to date.
This tutorial includes: an overview of the available packages, instructions for configuring the package manager, the process for installing packages from the 10gen repository, and preliminary MongoDB configuration and operation.
Note
If you use an older Ubuntu that does not use Upstart, (i.e. any version before 9.10 “Karmic”) please follow the instructions on the Install MongoDB on Debian tutorial.
See also
The documentation of following related processes and concepts.
Other installation tutorials:
The 10gen repository contains three packages:
mongodb-10gen
This package contains the latest stable release. Use this for production deployments.
mongodb20-10gen
This package contains the stable release of v2.0 branch.
mongodb18-10gen
This package contains the stable release of v1.8 branch.
You cannot install these packages concurrently with each other or with the mongodb package that your release of Ubuntu may include.
10gen also provides packages for “unstable” or development versions of MongoDB. Use the mongodb-10gen-unstable package to test the latest development release of MongoDB, but do not use this version in production.
The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
Create a /etc/apt/sources.list.d/10gen.list file and include the following line for the 10gen repository.
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
Now issue the following command to reload your repository:
sudo apt-get update
Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen
When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.
These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.
This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.
Note
If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.
You can start the mongod process by issuing the following command:
sudo service mongodb start
You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.
As needed, you may stop the mongod process by issuing the following command:
sudo service mongodb stop
You may restart the mongod process by issuing the following command:
sudo service mongodb restart
Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database.
> db.test.save( { a: 1 } )
> db.test.find()
See also
“mongo” and “JavaScript Interface“
This tutorial outlines the basic installation process for installing MongoDB on Debian systems. This tutorial uses .deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages for easy installation and management for users of Debian systems. While some of these distributions include their own MongoDB packages, the 10gen packages are generally more up to date.
This tutorial includes: an overview of the available packages, instructions for configuring the package manager, the process for installing packages from the 10gen repository, and preliminary MongoDB configuration and operation.
Note
If you’re running a version of Ubuntu Linux prior to 9.10 “Karmic,” use this tutorial. Other Ubuntu users will want to follow the Install MongoDB on Ubuntu tutorial.
See also
The documentation of following related processes and concepts.
Other installation tutorials:
The 10gen repository contains three packages:
mongodb-10gen
This package contains the latest stable release. Use this for production deployments.
mongodb20-10gen
This package contains the stable release of v2.0 branch.
mongodb18-10gen
This package contains the stable release of v1.8 branch.
You cannot install these packages concurrently with each other or with the mongodb package that your release of Debian may include.
10gen also provides packages for “unstable” or development versions of MongoDB. Use the mongodb-10gen-unstable package to test the latest development release of MongoDB, but do not use this version in production.
The Debian package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
Create a the /etc/apt/sources.list.d/10gen.list file and include the following line for the 10gen repository.
deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen
Now issue the following command to reload your repository:
sudo apt-get update
Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen
When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.
These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You can find the control script at /etc/init.d/mongodb.
This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.
Note
If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.
Issue the following command to start mongod:
sudo /etc/init.d/mongodb start
You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.
Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database.
> db.test.save( { a: 1 } )
> db.test.find()
See also
“mongo” and “JavaScript Interface“
10gen provides compiled versions of MongoDB for use on Linux that provides a simple option for users who cannot use packages. This tutorial outlines the basic installation of MongoDB using these compiled versions and an initial usage guide.
See also
The documentation of following related processes and concepts.
Other installation tutorials:
Note
You should place the MongoDB binaries in a central location on the file system that is easy to access and control. Consider /opt or /usr/local/bin.
In a terminal session, begin by downloading the latest release. In most cases you will want to download the 64-bit version of MongoDB.
curl http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.2.2.tgz > mongo.tgz
If you need to run the 32-bit version, use the following command.
curl http://downloads.mongodb.org/linux/mongodb-linux-i686-2.2.2.tgz > mongo.tgz
Once you’ve downloaded the release, issue the following command to extract the files from the archive:
tar -zxvf mongo.tgz
Optional
You may use the following command to copy the extracted folder into a more generic location.
cp -R -n mongodb-linux-????-??-??/ mongodb
You can find the mongod binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the extracted directory.
Before you start mongod for the first time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, use the following command:
mkdir -p /data/db
Note
Ensure that the system account that will run the mongod process has read and write permissions to this directory. If mongod runs under the mongo user account, issue the following command to change the owner of this folder:
chown mongo /data/db
If you use an alternate location for your data directory, ensure that this user can write to your chosen data path.
You can specify, and create, an alternate path using the --dbpath option to mongod and the above command.
The 10gen builds of MongoDB contain no control scripts or method to control the mongod process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your /usr/local/bin or /usr/bin directory for easier use.
For testing purposes, you can start a mongod directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf
Note
The above command assumes that the mongod binary is accessible via your system’s search path, and that you have created a default configuration file located at /etc/mongod.conf.
Among the tools included with this MongoDB distribution, is the mongo shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt:
./bin/mongo
Note
The ./bin/mongo command assumes that the mongo binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz file.
This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that record:
> db.test.save( { a: 1 } )
> db.test.find()
See also
“mongo” and “JavaScript Interface“
This tutorial outlines the basic installation process for deploying MongoDB on Macintosh OS X systems. This tutorial provides two main methods of installing the MongoDB server (i.e. “mongod”) and associated tools: first using the community package management tools, and second using builds of MongoDB provided by 10gen.
See also
The documentation of following related processes and concepts.
Other installation tutorials:
Both community package management tools: Homebrew and MacPorts require some initial setup and configuration. This configuration is beyond the scope of this document. You only need to use one of these tools.
If you want to use package management, and do not already have a system installed, Homebrew is typically easier and simpler to use.
Homebrew installs binary packages based on published “formula.” Issue the following command at the system shell to update the brew package manager:
brew update
Use the following command to install the MongoDB package into your Homebrew system.
brew install mongodb
Later, if you need to upgrade MongoDB, you can issue the following sequence of commands to update the MongoDB installation on your system:
brew update
brew upgrade mongodb
MacPorts distributes build scripts that allow you to easily build packages and their dependencies on your own system. The compilation process can take significant period of time depending on your system’s capabilities and existing dependencies. Issue the following command in the system shell:
port install mongodb
The packages installed with Homebrew and MacPorts contain no control scripts or interaction with the system’s process manager.
If you have configured Homebrew and MacPorts correctly, including setting your PATH, the MongoDB applications and utilities will be accessible from the system shell. Start the mongod process in a terminal (for testing or development) or using a process management tool.
mongod
Then open the mongo shell by issuing the following command at the system prompt:
mongo
This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that record.
> db.test.save( { a: 1 } )
> db.test.find()
See also
“mongo” and “JavaScript Interface“
10gen provides compiled binaries of all MongoDB software compiled for OS X, which may provide a more straightforward installation process.
In a terminal session, begin by downloading the latest release. Use the following command at the system prompt:
curl http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.2.2.tgz > mongo.tgz
Note
The mongod process will not run on older Macintosh computers with PowerPC (i.e. non-Intel) processors.
Once you’ve downloaded the release, issue the following command to extract the files from the archive:
tar -zxvf mongo.tgz
Optional
You may use the following command to move the extracted folder into a more generic location.
mv -n mongodb-osx-[platform]-[version]/ /path/to/new/location/
Replace [platform] with i386 or x86_64 depending on your system and the version you downloaded, and [version] with 2.2.2 or the version of MongoDB that you are installing.
You can find the mongod binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the archive.
Before you start mongod for the first time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, and set the appropriate permissions use the following commands:
sudo mkdir -p /data/db
sudo chown `id -u` /data/db
You can specify an alternate path for data files using the --dbpath option to mongod.
The 10gen builds of MongoDB contain no control scripts or method to control the mongod process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your /usr/local/bin directory for easier use.
For testing purposes, you can start a mongod directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf
Note
This command assumes that the mongod binary is accessible via your system’s search path, and that you have created a default configuration file located at /etc/mongod.conf.
Among the tools included with this MongoDB distribution, is the mongo shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt from inside of the directory where you extracted mongo:
./bin/mongo
Note
The ./bin/mongo command assumes that the mongo binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz file.
This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that record:
> db.test.save( { a: 1 } )
> db.test.find()
See also
“mongo” and “JavaScript Interface“
This tutorial provides a method for installing and running the MongoDB server (i.e. “mongod.exe”) on the Microsoft Windows platform through the Command Prompt and outlines the process for setting up MongoDB as a Windows Service.
Operating MongoDB with Windows is similar to MongoDB on other platforms. Most components share the same operational patterns.
Download the latest production release of MongoDB from the MongoDB downloads page.
There are three builds of MongoDB for Windows:
Changed in version 2.2: MongoDB does not support Windows XP. Please use a more recent version of Windows to use more recent releases of MongoDB.
Note
Always download the correct version of MongoDB for your Windows system. The 64-bit versions of MongoDB will not work with 32-bit Windows.
32-bit versions of MongoDB are suitable only for testing and evaluation purposes and only support databases smaller than 2GB.
You can find the architecture of your version of Windows platform using the following command in the Command Prompt
wmic os get osarchitecture
In Windows Explorer, find the MongoDB download file, typically in the default Downloads directory. Extract the archive to C:\ by right clicking on the archive and selecting Extract All and browsing to C:\.
Note
The folder name will be either:
C:\mongodb-win32-i386-[version]
Or:
C:\mongodb-win32-x86_64-[version]
In both examples, replace [version] with the version of MongoDB downloaded.
Start the Command Prompt by selecting the Start Menu, then All Programs, then Accessories, then right click Command Prompt, and select Run as Administrator from the popup menu. In the Command Prompt, issue the following commands:
cd \
move C:\mongodb-win32-* C:\mongodb
Note
MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder you choose. You may install MongoDB in any directory (e.g. D:\test\mongodb)
MongoDB requires a data folder to store its files. The default location for the MongoDB data directory is C:\data\db. Create this folder using the Command Prompt. Issue the following command sequence:
md data
md data\db
Note
You may specify an alternate path for \data\db with the dbpath setting for mongod.exe, as in the following example:
C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data
If your path includes spaces, enclose the entire path in double quotations, for example:
C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongo db data"
To start MongoDB, execute from the Command Prompt:
C:\mongodb\bin\mongod.exe
This will start the main MongoDB database process. The waiting for connections message in the console output indicates that the mongod.exe process is running successfully.
Note
Depending on the security level of your system, Windows will issue a Security Alert dialog box about blocking “some features” of C:\\mongodb\bin\mongod.exe from communicating on networks. All users should select Private Networks, such as my home or work network and click Allow access. For additional information on security and MongoDB, please read the Security and Authentication wiki page.
Warning
Do not allow mongod.exe to be accessible to public networks without running in “Secure Mode” (i.e. auth.) MongoDB is designed to be run in “trusted environments” and the database does not enable authentication or “Secure Mode” by default.
Connect to MongoDB using the mongo.exe shell. Open another Command Prompt and issue the following command:
C:\mongodb\bin\mongo.exe
Note
Executing the command start C:\mongodb\bin\mongo.exe will automatically start the mongo.exe shell in a separate Command Prompt window.
The mongo.exe shell will connect to mongod.exe running on the localhost interface and port 27017 by default. At the mongo.exe prompt, issue the following two commands to insert a record in the test collection of the default test database and then retrieve that record:
> db.test.save( { a: 1 } )
> db.test.find()
See also
“mongo” and “JavaScript Interface.” If you want to develop applications using .NET, see the C# Language Center wiki page for more information.
New in version 2.0.
Setup MongoDB as a Windows Service, so that the database will start automatically following each reboot cycle.
Note
mongod.exe added support for running as a Windows service in version 2.0, and mongos.exe added support for running as a Windows Service in version 2.1.1.
You should specify two options when running MongoDB as a Windows Service: a path for the log output (i.e. logpath) and a configuration file.
Create a specific directory for MongoDB log files:
md C:\mongodb\log
Create a configuration file for the logpath option for MongoDB in the Command Prompt by issuing this command:
echo logpath=C:\mongodb\log\mongo.log > C:\mongodb\mongod.cfg
While these optional steps are optional, creating a specific location for log files and using the configuration file are good practice.
Run all of the following commands in Command Prompt with “Administrative Privileges:”
To install the MongoDB service:
C:\mongodb\bin\mongod.exe --config C:\mongodb\mongod.cfg --install
Modify the path to the mongod.cfg file as needed. For the --install option to succeed, you must specify a logpath setting or the --logpath run-time option.
To run the MongoDB service:
net start MongoDB
Note
If you wish to use an alternate path for your dbpath specify it in the config file (e.g. C:\mongodb\mongod.cfg) on that you specified in the --install operation. You may also specify --dbpath on the command line; however, always prefer the configuration file.
If the dbpath directory does not exist, mongod.exe will not be able to start. The default value for dbpath is \data\db.
To stop the MongoDB service:
net stop MongoDB
To remove the MongoDB service:
C:\mongodb\bin\mongod.exe --remove
After you have installed MongoDB, consider the following documents as you begin to learn about MongoDB:
This tutorial provides an introduction to basic database operations using the mongo shell. mongo is a part of the standard MongoDB distribution and provides a full JavaScript environment with a complete access to the JavaScript language and all standard functions as well as a full database interface for MongoDB. See the mongo JavaScript API documentation and the mongo shell JavaScript Method Reference.
The tutorial assumes that you’re running MongoDB on a Linux or OS X operating system and that you have a running database server; MongoDB does support Windows and provides a Windows distribution with identical operation. For instructions on installing MongoDB and starting the database server see the appropriate installation document.
This tutorial addresses the following aspects of MongoDB use:
In this section you connect to the database server, which runs as mongod, and begin using the mongo shell to select a logical database within the database instance and access the help text in the mongo shell.
From a system prompt, start mongo by issuing the mongo command, as follows:
mongo
By default, mongo looks for a database server listening on port 27017 on the localhost interface. To connect to a server on a different port or interface, use the --port and --host options.
After starting the mongo shell your session will use the test database for context, by default. At any time issue the following operation at the mongo to report the current database:
db
db returns the name of the current database.
From the mongo shell, display the list of databases with the following operation:
show dbs
Switch to a new database named mydb with the following operation:
use mydb
Confirm that your session has the mydb database as context, using the db operation, which returns the name of the current database as follows:
db
At this point, if you issue the show dbs operation again, it will not include mydb, because MongoDB will not create a database until you insert data into that database. The Create a Collection and Insert Documents section describes the process for inserting data.
At any point you can access help for the mongo shell using the following operation:
help
Furthermore, you can append the .help() method to some JavaScript methods, any cursor object, as well as the db and db.collection objects to return additional help information.
In this section, you insert documents into a new collection named things within the new database named mydb.
MongoDB will create collections and databases implicitly upon their first use: you do not need to create the database or collection before inserting data. Furthermore, because MongoDB uses dynamic schemas, you do not need to specify the structure of your documents before inserting them into the collection.
From the mongo shell, confirm that the current context is the mydb database with the following operation:
db
If mongo does not return mydb for the previous operation, set the context to the mydb database with the following operation:
use mydb
Create two documents, named j and k, with the following sequence of JavaScript operations:
j = { name : "mongo" }
k = { x : 3 }
Insert the j and k documents into the collection things with the following sequence of operations:
db.things.insert( j )
db.things.insert( k )
When you insert the first document, the mongod will create both the mydb database and the things collection.
Confirm that the collection named things exists using the following operation:
show collections
The mongo shell will return the list of the collections in the current (i.e. mydb) database. At this point, the only collection is things. All mongod databases also have a system.indexes collection.
Confirm that the documents exist in the collection things by issuing query on the things collection. Using the find() method in an operation that resembles the following:
db.things.find()
This operation returns the following results. The ObjectId values will be unique:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
All MongoDB documents must have an _id field with a unique value. These operations do not explicitly specify a value for the _id field, so mongo creates a unique ObjectId value for the field before inserting it into the collection.
From the mongo shell, add more documents to the things collection using the following for loop:
for (var i = 1; i <= 20; i++) db.things.insert( { x : 4 , j : i } )
Query the collection by issuing the following command:
db.things.find()
The mongo shell displays the first 20 documents in the collection. Your ObjectId values will be different:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
The find() returns a cursor. To iterate the cursor and return more documents use the it operation in the mongo shell. The mongo shell will exhaust the cursor, and return the following documents:
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
For more information on inserting new documents, see the Insert documentation.
When you query a collection, MongoDB returns a “cursor” object that contains the results of the query. The mongo shell then iterates over the cursor to display the results. Rather than returning all results at once, the shell iterates over the cursor 20 times to display the first 20 results and then waits for a request to iterate over the remaining results. This prevents mongo from displaying thousands or millions of results at once.
The it operation allows you to iterate over the next 20 results in the shell the next 20 results. In the previous procedure, the cursor only contained two more documents, and so only two more documents displayed.
The procedures in this section show other ways to work with a cursor. For comprehensive documentation on cursors, see Cursor.
In the MongoDB JavaScript shell, query the things collection and assign the resulting cursor object to the c variable:
var c = db.things.find()
Print the full result set by using a while loop to iterate over the c variable:
while ( c.hasNext() ) printjson( c.next() )
The hasNext() function returns true if the cursor has documents. The next() method returns the next document. The printjson() method renders the document in a JSON-like format.
The result of this operation follows, although if the ObjectId values will be unique:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
You can manipulate a cursor object as if it were an array. Consider the following procedure:
In the mongo shell, query the things collection and assign the resulting cursor object to the c variable:
var c = db.things.find()
To find the document at the array index 4, use the following operation:
printjson( c [ 4 ] )
MongoDB returns the following:
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
When you access documents in a cursor using the array index notation, mongo first calls the cursor.toArray() method and loads into RAM all documents returned by the cursor. The index is then applied to the resulting array. This operation iterates the cursor completely and exhausts the cursor.
For very large result sets, mongo may run out of available memory.
For more information on the cursor, see Cursor.
MongoDB has a rich query system that allows you to select and filter the documents in a collection along specific fields and values. See Query Document and Read for a full account of queries in MongoDB.
In this procedure, you query for specific documents in the things collection by passing a “query document” as a parameter to the find() method. A query document specifies the criteria the query must match to return a document.
To query for specific documents, do the following:
In the mongo shell, query for all documents where the name field has a value of mongo by passing the { name : "mongo" } query document as a parameter to the find() method:
db.things.find( { name : "mongo" } )
MongoDB returns one document that fits this criteria. The ObjectId value will be different:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
Query for all documents where x has a value of 4 by passing the { x : 4 } query document as a parameter to find():
db.things.find( { x : 4 } )
MongoDB returns the following result set:
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
ObjectId values are always unique.
Query for all documents where x has a value of 4, as in the previous query, but only return only the value of j. MongoDB will also return the _id field, unless explicitly excluded. To do this, you add the { j : 1 } document as the projection in the second parameter to find(). This operation would resemble the following:
db.things.find( { x : 4 } , { j : 1 } )
MongoDB returns the following results:
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "j" : 20 }
With the db.collection.findOne() method you can return a single document from a MongoDB collection. The findOne() method takes the same parameters as find(), but returns a document rather than a cursor.
To retrieve one document from the things collection, issue the following command:
db.things.findOne()
For more information on querying for documents, see the Read and Read Operations documentation.
You can constrain the size of the result set to increase performance by limiting the amount of data your application must receive over the network.
To specify the maximum number of documents in the result set, call the limit() method on a cursor, as in the following command:
db.things.find().limit(3)
MongoDB will return the following result, with different ObjectId values:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
For more information on manipulating the documents in a database as you continue to learn MongoDB, consider the following resources:
You should always install the latest, stable version of MongoDB. Stable versions have an even-numbered minor version number. For example: v2.2 is stable, v2.0 and v1.8 were previously the stable, while v2.1 and v2.3 is a development version.
Database replication ensures redundancy, backup, and automatic failover. Replication occurs through groups of servers known as replica sets.
This page lists the documents, tutorials, and reference pages that describe replica sets.
For an overview, see Replication Fundamentals. To work with members, see Replica Set Administration. To configure deployment architecture, see Replication Architectures. To modify read and write operations, see Application Development with Replica Sets. For procedures for performing certain replication tasks, see the list of replication tutorials.
The following is the outline of the main documentation:
A MongoDB replica set is a cluster of mongod instances that replicate amongst one another and ensure automated failover. Most replica sets consists of two or more mongod instances with at most one of these designated as the primary and the rest as secondary members. Clients direct all writes to the primary, while the secondary members replicate from the primary asynchronously.
Database replication with MongoDB adds redundancy, helps to ensure high availability, simplifies certain administrative tasks such as backups, and may increase read capacity. Most production deployments use replication.
If you’re familiar with other database systems, you may think about replica sets as a more sophisticated form of traditional master-slave replication. [1] In master-slave replication, a master node accepts writes while one or more slave nodes replicate those write operations and thus maintain data sets identical to the master. For MongoDB deployments, the member that accepts write operations is the primary, and the replicating members are secondaries.
MongoDB’s replica sets provide automated failover. If a primary fails, the remaining members will automatically try to elect a new primary.
A replica set can have up to 12 members, but only 7 members can have votes. For information regarding non-voting members, see non-voting members
See also
The Replication index for a list of the documents in this manual that describe the operation and use of replica sets.
| [1] | MongoDB also provides conventional master/slave replication. Master/slave replication operates by way of the same mechanism as replica sets, but lacks the automatic failover capabilities. While replica sets are the recommended solution for production, a replica set can support only 12 members in total. If your deployment requires more than 11 slave members, you’ll need to use master/slave replication. |
You can configure replica set members in a variety of ways, as listed here. In most cases, members of a replica set have the default proprieties.
For more information about each member configuration, see the Member Configurations section in the Replica Set Administration document.
Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.
For a detailed explanation of failover, see the Failover and Recovery section in the Replica Set Administration document.
When any failover occurs, an election takes place to decide which member should become primary.
Elections provide a mechanism for the members of a replica set to autonomously select a new primary without administrator intervention. The election allows replica sets to recover from failover situations very quickly and robustly.
Whenever the primary becomes unreachable, the secondary members trigger an election. The first member to receive votes from a majority of the set will become primary. The most important feature of replica set elections is that a majority of the original number of members in the replica set must be present for election to succeed. If you have a three-member replica set, the set can elect a primary when two or three members can connect to each other. If two members in the replica go offline, then the remaining member will remain a secondary.
Note
When the current primary steps down and triggers an election, the mongod instances will close all client connections. This ensures that the clients maintain an accurate view of the replica set and helps prevent rollbacks.
For more information on elections and failover, see:
In a replica set, every member has a “priority,” that helps determine eligibility for election to primary. By default, all members have a priority of 1, unless you modify the members[n].priority value. All members have a single vote in elections.
Warning
Always configure the members[n].priority value to control which members will become primary. Do not configure members[n].votes except to permit more than 7 secondary members.
For more information on member priorities, see the Adjusting Priority section in the Replica Set Administration document.
This section provides an overview of the concepts that underpin database consistency and the MongoDB mechanisms to ensure that users have access to consistent data.
In MongoDB, all read operations issued to the primary of a replica set are consistent with the last write operation.
If clients configure the read preference to permit allow secondary reads, read operations cannot return from secondary members that have not replicated more recent updates or operations. In these situations the query results may reflect a previous state.
This behavior is sometimes characterized as eventual consistency because the secondary member’s state will eventually reflect the primary’s state and MongoDB cannot guarantee strict consistency for read operations from secondary members.
There is no way to guarantee consistency for reads from secondary members, except by configuring the client and driver to ensure that write operations succeed on all members before completing successfully.
In some failover situations primaries will have accepted write operations that have not replicated to the secondaries after a failover occurs. This case is rare and typically occurs as a result of a network partition with replication lag. When this member (the former primary) rejoins the replica set and attempts to continue replication as a secondary the former primary must revert these operations or “roll back” these operations to maintain database consistency across the replica set.
MongoDB writes the rollback data to a BSON file in the database’s dbpath directory. Use bsondump to read the contents of these rollback files and then manually apply the changes to the new primary. There is no way for MongoDB to appropriately and fairly handle rollback situations automatically. Therefore you must intervene manually to apply rollback data. Even after the member completes the rollback and returns to secondary status, administrators will need to apply or decide to ignore the rollback data. MongoDB writes rollback data to a rollback/ folder within the dbpath directory to files with filenames in the following form:
<database>.<collection>.<timestamp>.bson
For example:
records.accounts.2011-05-09T18-10-04.0.bson
The best strategy for avoiding all rollbacks is to ensure write propagation to all or some of the members in the set. Using these kinds of policies prevents situations that might create rollbacks.
Warning
A mongod instance will not rollback more than 300 megabytes of data. If your system needs to rollback more than 300 MB, you will need to manually intervene to recover this data. If this is the case, you will find the following line in your mongod log:
[replica set sync] replSet syncThread: 13410 replSet too much data to roll back
In these situations you will need to manually intervene to either save data manually, or resync from a “current” member of the set by deleting the content of the existing:setting:dbpath directory to resume normal operation.
For more information on failover, see:
Client applications are indifferent to the configuration and operation of replica sets. While specific configuration depends to some extent on the client drivers, there is often minimal or no difference between applications using replica sets or standalone instances.
There are two major concepts that are important to consider when working with replica sets:
Write concern sends a MongoDB client a response from the server to confirm successful write operations. In replica sets you can configure replica acknowledged write concern to ensure that secondary members of the set have replicated operations before the write returns.
By default, read operations issued against a replica set return results from the primary. Users may configure read preference on a per-connection basis to prefer that read operations return on the secondary members.
Read preference and write concern have particular consistency implications.
For a more detailed discussion of application concerns, see Application Development with Replica Sets.
This section provides a brief overview of concerns relevant to administrators of replica set deployments.
For more information on replica set administration, operations, and architecture, see:
The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify that data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then replicate this log and apply the operations to themselves in an asynchronous process. All replica set members contain a copy of the oplog, allowing them to maintain the current state of the database. Operations in the oplog are idempotent.
By default, the size of the oplog is as follows:
For 64-bit Linux, Solaris, and FreeBSD systems, MongoDB will allocate 5% of the available free disk space to the oplog.
If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space.
For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog.
For 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.
Before oplog creation, you can specify the size of your oplog with the oplogSize option. After you start a replica set member for the first time, you can only change the size of the oplog by using the Change the Size of the Oplog tutorial.
In most cases, the default oplog size is sufficient. For example, if an oplog that is 5% of free disk space fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for 24 hours before they require full resyncing. However, most replica sets have much lower operation volumes, and their oplogs can hold a much larger number of operations.
The following factors affect how MongoDB uses space in the oplog:
Update operations that affect multiple documents at once.
The oplog must translate multi-updates into individual operations, in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in disk utilization.
If you delete roughly the same amount of data as you insert.
In this situation the database will not grow significantly in disk utilization, but the size of the operation log can be quite large.
If a significant portion of your workload entails in-place updates.
In-place updates create a large number of operations but do not change the quantity data on disk.
If you can predict your replica set’s workload to resemble one of the above patterns, then you may want to consider creating an oplog that is larger than the default. Conversely, if the predominance of activity of your MongoDB-based application are reads and you are writing a small amount of data, you may find that you need a much smaller oplog.
To view oplog status, including the size and the time range of operations, issue the db.printReplicationInfo() method. For more information on oplog status, see Check the Size of the Oplog.
For additional information about oplog behavior, see Oplog Internals and Syncing.
Without replication, a standalone MongoDB instance represents a single point of failure and any disruption of the MongoDB system will render the database unusable and potentially unrecoverable. Replication increase the reliability of the database instance, and replica sets are capable of distributing reads to secondary members depending on read preference. For database work loads dominated by read operations, (i.e. “read heavy”) replica sets can greatly increase the capability of the database system.
The minimum requirements for a replica set include two members with data, for a primary and a secondary, and an arbiters. In most circumstances, however, you will want to deploy three data members.
For those deployments that rely heavily on distributing reads to secondary instances, add additional members to the set as load increases. As your deployment grows, consider adding or moving replica set members to secondary data centers or to geographically distinct locations for additional redundancy. While many architectures are possible, always ensure that the quorum of members required to elect a primary remains in your main facility.
Depending on your operational requirements, you may consider adding members configured for a specific purpose including, a delayed member to help provide protection against human errors and change control, a hidden member to provide an isolated member for reporting and monitoring, and/or a secondary only member for dedicated backups.
The process of establishing a new replica set member can be resource intensive on existing members. As a result, deploy new members to existing replica sets significantly before current demand saturates the existing members.
Note
Journaling, provides single-instance write durability. The journaling greatly improves the reliability and durability of a database. Unless MongoDB runs with journaling, when a MongoDB instance terminates ungracefully, the database can end in a corrupt and unrecoverable state.
You should assume that a database, running without journaling, that suffers a crash or unclean shutdown is in corrupt or inconsistent state.
Use journaling, however, do not forego proper replication because of journaling.
64-bit versions of MongoDB after version 2.0 have journaling enabled by default.
In most cases, replica set administrators do not have to keep additional considerations in mind beyond the normal security precautions that all MongoDB administrators must take. However, ensure that:
For more information, see the Security Considerations for Replica Sets section in the Replica Set Administration document.
The architecture and design of the replica set deployment can have a great impact on the set’s capacity and capability. This section provides a general overview of best practices for replica set architectures.
This document provides an overview of the complete functionality of replica sets, which highlights the flexibility of the replica set and its configuration. However, for most production deployments a conventional 3-member replica set with members[n].priority values of 1 are sufficient.
While the additional flexibility discussed is below helpful for managing a variety of operational complexities, it always makes sense to let those complex requirements dictate complex architectures, rather than add unnecessary complexity to your deployment.
Consider the following factors when developing an architecture for your replica set:
For more information regarding replica set configuration and deployments see Replication Architectures.
Replica sets automate most administrative tasks associated with database replication. Nevertheless, several operations related to deployment and systems management require administrator intervention remain. This document provides an overview of those tasks, in addition to a collection of troubleshooting suggestions for administers of replica sets.
See also
The following tutorials provide task-oriented instructions for specific administrative tasks related to replica set operation.
All replica sets have a single primary and one or more secondaries. Replica sets allow you to configure secondary members in a variety of ways. This section describes these configurations.
Note
A replica set can have up to 12 members, but only 7 members can have votes. For configuration information regarding non-voting members, see Non-Voting Members.
Warning
The rs.reconfig() shell method can force the current primary to step down, which causes an election. When the primary steps down, the mongod closes all client connections. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. To successfully reconfigure a replica set, a majority of the members must be accessible.
See also
The Elections section in the Replication Fundamentals document, and the Election Internals section in the Replication Internals document.
The secondary-only configuration prevents a secondary member in a replica set from ever becoming a primary in a failover. You can set secondary-only mode for any member of the set.
For example, you may want to configure all members of a replica sets located outside of the main data centers as secondary-only to prevent these members from ever becoming primary.
To configure a member as secondary-only, set its members[n].priority value to 0. Any member with a members[n].priority equal to 0 will never seek election and cannot become primary in any situation. For more information on priority levels, see Member Priority.
Note
When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id field in each document in the members array.
The _id rarely corresponds to the array index.
As an example of modifying member priorities, assume a four-member replica set. Use the following sequence of operations in the mongo shell to modify member priorities:
cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[1].priority = 0.5
cfg.members[2].priority = 1
cfg.members[3].priority = 2
rs.reconfig(cfg)
This configures the set, with the following priority settings:
Note
If your replica set has an even number of members, add an arbiter to ensure that members can quickly obtain a majority of votes in an election for primary.
See also
Delayed members copy and apply operations from the primary’s oplog with a specified delay. If a member has a delay of one hour, then the latest entry in this member’s oplog will not be more recent than one hour old, and the state of data for the member will reflect the state of the set an hour earlier.
Example
If the current time is 09:52 and the secondary is a delayed by an hour, no operation will be more recent than 08:52.
Delayed members may help recover from various kinds of human error. Such errors may include inadvertently deleted databases or botched application upgrades. Consider the following factors when determining the amount of slave delay to apply:
Delayed members must have a priority set to 0 to prevent them from becoming primary in their replica sets. Also these members should be hidden to prevent your application from seeing or querying this member.
To configure a replica set member with a one hour delay, use the following sequence of operations in the mongo shell:
cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].slaveDelay = 3600
rs.reconfig(cfg)
After the replica set reconfigures, the first member of the set in the members array will have a priority of 0 and cannot become primary. The slaveDelay value delays both replication and the member’s oplog by 3600 seconds (1 hour). Setting slaveDelay to a non-zero value also sets hidden to true for this replica set so that it does not receive application queries in normal operations.
Warning
The length of the secondary slaveDelay must fit within the window of the oplog. If the oplog is shorter than the slaveDelay window, the delayed member cannot successfully replicate operations.
See also
members[n].slaveDelay, Replica Set Reconfiguration, Oplog, Changing Oplog Size in this document, and the Change the Size of the Oplog tutorial.
Arbiters are special mongod instances that do not hold a copy of the data and thus cannot become primary. Arbiters exist solely participate in elections.
Note
Because of their minimal system requirements, you may safely deploy an arbiter on a system with another workload, such as an application server or monitoring member.
Warning
Do not run arbiter processes on a system that is an active primary or secondary of its replica set.
Arbiters never receive the contents of any collection but do have the following interactions with the rest of the replica set:
Credential exchanges that authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyfiles. These exchanges are encrypted.
MongoDB only transmits the authentication credentials in a cryptographically secure exchange, and encrypts no other exchange.
Exchanges of replica set configuration data and of votes. These are not encrypted.
If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation for Using MongoDB with SSL Connections for more information. As with all MongoDB components, run arbiters on secure networks.
To add an arbiter, see Adding an Arbiter.
You may choose to change the number of votes that each member has in elections for primary. In general, all members should have only 1 vote to prevent intermittent ties, deadlock, or the wrong members from becoming primary. Use replica set priorities to control which members are more likely to become primary.
To disable a member’s ability to vote in elections, use the following command sequence in the mongo shell.
cfg = rs.conf()
cfg.members[3].votes = 0
cfg.members[4].votes = 0
cfg.members[5].votes = 0
rs.reconfig(cfg)
This sequence gives 0 votes to the fourth, fifth, and sixth members of the set according to the order of the members array in the output of rs.conf(). This setting allows the set to elect these members as primary but does not allow them to vote in elections. If you have three non-voting members, you can add three additional voting members to your set. Place voting members so that your designated primary or primaries can reach a majority of votes in the event of a network partition.
Note
In general and when possible, all members should have only 1 vote. This prevents intermittent ties, deadlocks, or the wrong members from becoming primary. Use Replica Set Priorities to control which members are more likely to become primary.
See also
This section gives overview information on certain procedures. Most procedures, however, are found in the replica set tutorials.
Before adding a new member to an existing replica set, do one of the following to prepare the new member’s data directory:
Make sure the new member’s data directory does not contain data. The new member will copy the data from an existing member.
If the new member is in a recovering state, it must exit and become a secondary before MongoDB can copy all data as part of the replication process. This process takes time but does not require administrator intervention.
Manually copy the data directory from an existing member. The new member becomes a secondary member and will catch up to the current state of the replica set after a short interval. Copying the data over manually shortens the amount of time for the new member to become current.
Ensure that you can copy the data directory to the new member and begin replication within the window allowed by the oplog. If the difference in the amount of time between the most recent operation and the most recent operation to the database exceeds the length of the oplog on the existing members, then the new instance will have to completely resynchronize, as described in Resyncing a Member of a Replica Set.
Use db.printReplicationInfo() to check the current state of replica set members with regards to the oplog.
For the procedure to add a member to a replica set, see Add Members to a Replica Set.
You may remove a member of a replica set at any time; however, for best results always shut down the mongod instance before removing it from a replica set.
Changed in version 2.2: Before 2.2, you had to shut down the mongod instance before removing it. While 2.2 removes this requirement, it remains good practice.
To remove a member, use the rs.remove() method in the mongo shell while connected to the current primary. Issue the db.isMaster() command when connected to any member of the set to determine the current primary. Use a command in either of the following forms to remove the member:
rs.remove("mongo2.example.net:27017")
rs.remove("mongo3.example.net")
This operation disconnects the shell briefly and forces a re-connection as the replica set renegotiates which member will be primary. The shell displays an error even if this command succeeds.
You can re-add a removed member to a replica set at any time using the procedure for adding replica set members. Additionally, consider using the replica set reconfiguration procedure to change the members[n].host value to rename a member in a replica set directly.
Use this procedure to replace a member of a replica set when the hostname has changed. This procedure preserves all existing configuration for a member, except its hostname/location.
You may need to replace a replica set member if you want to replace an existing system and only need to change the hostname rather than completely replace all configured options related to the previous member.
Use rs.reconfig() to change the value of the members[n].host field to reflect the new hostname or port number. rs.reconfig() will not change the value of members[n]._id.
cfg = rs.conf()
cfg.members[0].host = "mongo2.example.net:27019"
rs.reconfig(cfg)
To change the value of the members[n].priority value in the replica set configuration, use the following sequence of commands in the mongo shell:
cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 2
cfg.members[2].priority = 2
rs.reconfig(cfg)
The first operation uses rs.conf() to set the local variable cfg to the contents of the current replica set configuration, which is a document. The next three operations change the members[n].priority value in the cfg document for the first three members configured in the members array. The final operation calls rs.reconfig() with the argument of cfg to initialize the new configuration.
Note
When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id field in each document in the members array.
The _id rarely corresponds to the array index.
If a member has members[n].priority set to 0, it is ineligible to become primary and will not seek election. Hidden members, delayed members, and arbiters all have members[n].priority set to 0.
All members have a members[n].priority equal to 1 by default.
The value of members[n].priority can be any floating point (i.e. decimal) number between 0 and 1000. Priorities are only used to determine the preference in election. The priority value is used only in relation to other members. With the exception of members with a priority of 0, the absolute value of the members[n].priority value is irrelevant.
Replica sets will preferentially elect and maintain the primary status of the member with the highest members[n].priority setting.
Warning
Replica set reconfiguration can force the current primary to step down, leading to an election for primary in the replica set. Elections cause the current primary to close all open client connections.
Perform routine replica set reconfiguration during scheduled maintenance windows.
See also
The Replica Reconfiguration Usage example revolves around changing the priorities of the members of a replica set.
For a description of arbiters and their purpose in replica sets, see Arbiters.
To prevent tied elections, do not add an arbiter to a set if the set already has an odd number of voting members.
Because arbiters do not hold a copies of collection data, they have minimal resource requirements and do not require dedicated hardware.
Create a data directory for the arbiter. The mongod uses this directory for configuration information. It will not hold database collection data. The following example creates the /data/arb data directory:
mkdir /data/arb
Start the arbiter, making sure to specify the replica set name and the data directory. Consider the following example:
mongod --port 30000 --dbpath /data/arb --replSet rs
In a mongo shell connected to the primary, add the arbiter to the replica set by issuing the rs.addArb() method, which uses the following syntax:
rs.addArb("<hostname>:<port>")
For example, if the arbiter runs on m1.example.net:30000, you would issue this command:
rs.addArb("m1.example.net:30000")
The following is an overview of the procedure for changing the size of the oplog. For a detailed procedure, see Change the Size of the Oplog.
When a secondary’s replication process falls behind so far that primary overwrites oplog entries that the secondary has not yet replicated, that secondary cannot catch up and becomes “stale.” When that occurs, you must resync the member by removing its data and replacing it with up-to-date data.
To do so, use one of the following approaches:
Restart the mongod with an empty data directory and let MongoDB’s normal replication syncing feature restore the data. This is the more simple option, but may take longer to replace the data.
Restart the machine with a copy of a recent data directory from another member in the replica set. This procedure can replace the data more quickly but requires more manual steps.
This procedure relies on MongoDB’s regular process for syncing a new member to restore the data on the stale member. For an overview of how MongoDB syncs replica sets, see the Syncing section.
To resync the stale member:
Stop the member’s mongod instance using the mongod --shutdown option. Make sure to set --dbpath to the member’s data directory, as in the following:
mongod --dbpath /data/db/ --shutdown
Delete all data and sub-directories from the member’s data directory such that the directory is empty.
Restart the mongod instance on the member. Consider the following example:
mongod --dbpath /data/db/ --replSet rsProduction
At this point, the mongod will resync. This process may take a long time, depending on the size of the database and speed of the network. Remember that this operation may have an impact on the working set and/or traffic to existing primary other members of the set.
This approach uses the data directory of an existing member to “seed” the stale member. The data must be recent enough to allow the new member to catch up with the oplog.
To resync by copying data from another member, use one of the following approaches:
In most cases, the most effective ways to control access and to secure the connection between members of a replica set depend on network-level access control. Use your environment’s firewall and network routing to ensure that traffic only from clients and other replica set members can reach your mongod instances. If needed, use virtual private networks (VPNs) to ensure secure connections over wide area networks (WANs.)
Additionally, MongoDB provides an authentication mechanism for mongod and mongos instances connecting to replica sets. These instances enable authentication but specify a shared key file that serves as a shared password.
New in version 1.8: for replica sets (1.9.1 for sharded replica sets) added support for authentication.
To enable authentication add the following option to your configuration file:
keyFile = /srv/mongodb/keyfile
Note
You may chose to set these run-time configuration options using the --keyFile (or mongos --keyFile) options on the command line.
Setting keyFile enables authentication and specifies a key file for the replica set members to use when authenticating to each other. The content of the key file is arbitrary but must be the same on all members of the replica set and on all mongos instances that connect to the set.
The key file must be less one kilobyte in size and may only contain characters in the base64 set. The key file must not have group or “world” permissions on UNIX systems. Use the following command to use the OpenSSL package to generate “random” content for use in a key file:
openssl rand -base64 753
Note
Key file permissions are not checked on Windows systems.
This section describes common strategies for troubleshooting replica sets.
See also
To display the current state of the replica set and current state of each member, run the rs.status() method in a mongo shell connected to the replica set’s primary. For descriptions of the information displayed by rs.status(), see Replica Set Status Reference.
Note
The rs.status() method is a wrapper that runs the replSetGetStatus database command.
Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments. Excessive replication lag makes “lagged” members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.
To check the current length of replication lag:
In a mongo shell connected to the primary, call the db.printSlaveReplicationInfo() method.
The returned document displays the syncedTo value for each member, which shows you when each member last read from the oplog, as shown in the following example:
source: m1.example.net:30001
syncedTo: Tue Oct 02 2012 11:33:40 GMT-0400 (EDT)
= 7475 secs ago (2.08hrs)
source: m2.example.net:30002
syncedTo: Tue Oct 02 2012 11:33:40 GMT-0400 (EDT)
= 7475 secs ago (2.08hrs)
Note
The rs.status() method is a wrapper around the replSetGetStatus database command.
Monitor the rate of replication by watching the oplog time in the “replica” graph in the MongoDB Monitoring Service. For more information see the documentation for MMS.
Possible causes of replication lag include:
Network Latency
Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.
Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints.
Disk Throughput
If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant systems, including vitalized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon’s EBS system.)
Use system-level tools to assess disk status, including iostat or vmstat.
Concurrency
In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries, as described in Write Concern. This prevents write operations from returning if replication cannot keep up with the write load.
Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.
Appropriate Write Concern
If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.
To prevent this, require write acknowledgment or journaled write concern after every 100, 1,000, or an another interval to provide an opportunity for secondaries to catch up with the primary.
For more information see:
All members of a replica set must be able to connect to every other member of the set to support replication. Always verify connections in both “directions.” Networking typologies and firewall configurations prevent normal and required connectivity, which can block replication.
Consider the following example of a bidirectional test of networking:
Example
Given a replica set with three members running on three separate hosts:
Test the connection from m1.example.net to the other hosts with the following operation set m1.example.net:
mongo --host m2.example.net --port 27017
mongo --host m3.example.net --port 27017
Test the connection from m2.example.net to the other two hosts with the following operation set from m2.example.net, as in:
mongo --host m1.example.net --port 27017
mongo --host m3.example.net --port 27017
You have now tested the connection between m2.example.net and m1.example.net in both directions.
Test the connection from m3.example.net to the other two hosts with the following operation set from the m3.example.net host, as in:
mongo --host m1.example.net --port 27017
mongo --host m2.example.net --port 27017
If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.
A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.
To check the size of the oplog for a given replica set member, connect to the member in a mongo shell and run the db.printReplicationInfo() method.
The output displays the size of the oplog and the date ranges of the operations contained in the oplog. In the following example, the oplog is about 10MB and is able to fit about 26 hours (94400 seconds) of operations:
configured oplog size: 10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time: Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time: Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now: Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)
The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week’s work of operations.
For more information on how oplog size affects operations, see:
Note
You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.
To change oplog size, see Changing Oplog Size in this document or see the Change the Size of the Oplog tutorial.
Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.
While failover is automatic, replica set administrators should still understand exactly how this process works. This section below describe failover in detail.
In most cases, failover occurs without administrator intervention seconds after the primary either steps down, becomes inaccessible, or becomes otherwise ineligible to act as primary. If your MongoDB deployment does not failover according to expectations, consider the following operational errors:
In many senses, rollbacks represent a graceful recovery from an impossible failover and recovery situation.
Rollbacks occur when a primary accepts writes that other members of the set do not successfully replicate before the primary steps down. When the former primary begins replicating again it performs a “rollback.” Rollbacks remove those operations from the instance that were never replicated to the set so that the data set is in a consistent state. The mongod program writes rolled back data to a BSON file that you can view using bsondump, applied manually using mongorestore.
You can prevent rollbacks using a replica acknowledged write concern. These write operations require not only the primary to acknowledge the write operation, sometimes even the majority of the set to confirm the write operation before returning.
enabling write concern.
See also
The Elections section in the Replication Fundamentals document, and the Election Internals section in the Replication Internals document.
Consider the following error in mongod output and logs:
replSet error fatal couldn't query the local local.oplog.rs collection. Terminating mongod after 30 seconds.
<timestamp> [rsStart] bad replSet oplog entry?
The most often cause of this error is wrongly typed value for the ts field in the last oplog entry might be of the wrong data type. The correct data type is Timestamp.
You can check the data type by running the following two queries against the oplog. If the data is properly typed, the queries will return the same document, otherwise these queries will return different documents. These queries are:
db.oplog.rs.find().sort({$natural:-1}).limit(1)
db.oplog.rs.find({ts:{$type:17}}).sort({$natural:-1}).limit(1)
The first query returns the last document in the oplog, while the second returns the last document in the oplog where the ts value is a Timestamp. The $type operator allows you to select for type 17 BSON, which is the Timestamp data type.
If the queries don’t return the same document, then the last document in the oplog has the wrong data type in the ts field.
Example
If the first query returns this as the last oplog entry:
{ "ts" : {t: 1347982456000, i: 1}, "h" : NumberLong("8191276672478122996"), "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 4 } }
And the second query returns this as the last entry where ts has the Timestamp type:
{ "ts" : Timestamp(1347982454000, 1), "h" : NumberLong("6188469075153256465"), "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 3 } }
Then the value for the ts field in the last oplog entry is of the wrong data type.
To set the proper type for this value and resolve this issue, use an update operation that resembles the following:
db.oplog.rs.update({ts:{t:1347982456000,i:1}}, {$set:{ts:new Timestamp(1347982456000, 1)}})
Modify the timestamp values as needed based on your oplog entry. This operation may take some period to complete because the update must scan and pull the entire oplog into memory.
The duplicate key on local.slaves error, occurs when a secondary or slave changes its hostname and the primary or master tries to update its local.slaves collection with the new name. The update fails because it contains the same _id value as the document containing the previous hostname. The error itself will resemble the following.
exception 11000 E11000 duplicate key error index: local.slaves.$_id_ dup key: { : ObjectId('<object ID>') } 0ms
This is a benign error and does not affect replication operations on the secondary or slave.
To prevent the error from appearing, drop the local.slaves collection from the primary or master, with the following sequence of operations in the mongo shell:
use local
db.slaves.drop()
The next time a secondary or slave polls the primary or master, the primary or master recreates the local.slaves collection.
There is no single ideal replica set architecture for every deployment or environment. Indeed the flexibility of replica sets might be their greatest strength. This document describes the most commonly used deployment patterns for replica sets. The descriptions are necessarily not mutually exclusive, and you can combine features of each architecture in your own deployment.
For an overview of operational practices and background information, see the Architectures topic in the Replication Fundamentals document.
The minimum recommended architecture for a replica set consists of:
One primary and
Two secondary members, either of which can become the primary at any time.
This makes failover possible and ensures there exists two full and independent copies of the data set at all times. If the primary fails, the replica set elects another member as primary and continues replication until the primary recovers.
Note
While not recommended, the minimum supported configuration for replica sets includes one primary, one secondary, and one arbiter. The arbiter requires fewer resources and lowers costs but sacrifices operational flexibility and redundancy.
See also
To increase redundancy or to provide additional resources for distributing secondary read operations, you can add additional members to a replica set.
When adding additional members, ensure the following architectural conditions are true:
The set has an odd number of voting members.
If you have an even number of voting members, deploy an arbiter to create an odd number.
The set has no more than 7 voting members at a time.
Members that cannot function as primaries in a failover have their priority values set to 0.
If a member cannot function as a primary because of resource or network latency constraints a priority value of 0 prevents it from being a primary. Any member with a priority value greater than 0 is available to be a primary.
A majority of the set’s members operate in the main data center.
See also
A geographically distributed replica set provides data recovery should one data center fail. These sets include at least one member in a secondary data center. The member has its the priority set to 0 to prevent the member from ever becoming primary.
In many circumstances, these deployments consist of the following:
If the primary is unavailable, the replica set will elect a new primary from the primary data center.
If the connection between the primary and secondary data centers fails, the member in the secondary center cannot independently become the primary.
If the primary data center fails, you can manually recover the data set from the secondary data center. With appropriate write concern there will be no data loss and downtime can be minimal.
When you add a secondary data center, make sure to keep an odd number of members overall to prevent ties during elections for primary by deploying an arbiter in your primary data center. For example, if you have three members in the primary data center and add a member in a secondary center, you create an even number. To create an odd number and prevent ties, deploy an arbiter in your primary data center.
In some cases it may be useful to maintain a member that has an always up-to-date copy of the entire data set but that cannot become primary. You might create such a member to provide backups, to support reporting operations, or to act as a cold standby. Such members fall into one or more of the following categories:
Note
All members of a replica set vote in elections except for non-voting members. Priority, hidden, or delayed status does not affect a member’s ability to vote in an election.
For some deployments, keeping a replica set member for dedicated backup purposes is operationally advantageous. Ensure this member is close, from a networking perspective, to the primary or likely primary. Ensure that the replication lag is minimal or non-existent. To create a dedicated hidden member for the purpose of creating backups.
If this member runs with journaling enabled, you can safely use standard block level backup methods to create a backup of this member. Otherwise, if your underlying system does not support snapshots, you can connect mongodump to create a backup directly from the secondary member. In these cases, use the --oplog option to ensure a consistent point-in-time dump of the database state.
See also
Delayed members are special mongod instances in a replica set that apply operations from the oplog on a delay to provide a running “historical” snapshot of the data set, or a rolling backup. Typically these members provide protection against human error, such as unintentionally deleted databases and collections or failed application upgrades or migrations.
Otherwise, delayed member function identically to secondary members, with the following operational differences: they are not eligible for election to primary and do not receive secondary queries. Delayed members do vote in elections for primary.
See Replica Set Delayed Nodes for more information about configuring delayed replica set members.
Typically hidden members provide a substrate for reporting purposes, because the replica set segregates these instances from the cluster. Since no secondary reads reach hidden members, they receive no traffic beyond what replication requires. While hidden members are not electable as primary, they are still able to vote in elections for primary. If your operational parameters requires this kind of reporting functionality, see Hidden Replica Set Nodes and members[n].hidden for more information regarding this functionality.
For some sets, it may not be possible to initialize a new members in a reasonable amount of time. In these situations, it may be useful to maintain a secondary member with an up-to-date copy for the purpose of replacing another member in the replica set. In most cases, these members can be ordinary members of the replica set, but in large sets, with varied hardware availability, or given some patterns of geographical distribution, you may want to use a member with a different priority, hidden, or voting status.
Cold standbys may be valuable when your primary and “hot standby” secondaries members have a different hardware specification or connect via a different network than the main set. In these cases, deploy members with priority equal to 0 to ensure that they will never become primary. These members will vote in elections for primary but will never be eligible for election to primary. Consider likely failover scenarios, such as inter-site network partitions, and ensure there will be members eligible for election as primary and a quorum of voting members in the main facility.
Note
If your set already has 7 members, set the members[n].votes value to 0 for these members, so that they won’t vote in elections.
See also
Secondary Only, and Hidden Nodes.
Deploy an arbiter to ensure that a replica set will have a sufficient number of members to elect a primary. While having replica sets with 2 members is not recommended for production environments, if you have just two members, deploy an arbiter. Also, for any replica set with an even number of members, deploy an arbiter.
To deploy an arbiter, see the Arbiters topic in the Replica Set Administration document.
From the perspective of a client application, whether a MongoDB instance is running as a single server (i.e. “standalone”) or a replica set is transparent. However, replica sets offer some configuration options for write and read operations. [1] This document describes those options and their implications.
| [1] | Sharded clusters where the shards are also replica sets provide the same configuration options with regards to write and read operations. |
MongoDB’s built-in write concern confirms the success of write operations to a replica set’s primary. Write concern issues the getLastError command after write operations to return an object with error information or confirmation that there are no errors.
The MongoDB Fall 2012 driver release enables write concern by default.
When enabled by default, write concern confirms write operations only on the primary. You can configure write concern to confirm write operations to additional replica set members as well by issuing the getLastError command with the w option.
The w option confirms that write operations have replicated to the specified number of replica set members, including the primary. You can either specify a number or specify majority, which ensures the write propagates to a majority of set members. The following example ensures the operation has replicated to two members (the primary and one other member):
db.runCommand( { getLastError: 1, w: 2 } )
The following example ensures the write operation has replicated to a majority of the configured members of the set.
db.runCommand( { getLastError: 1, w: "majority" } )
If you specify a w value greater than the number of members that hold a copy of the data (i.e., greater than the number of non-arbiter members), the operation blocks until those members become available. This can cause the operation to block forever. To specify a timeout threshold for the getLastError operation, use the wtimeout argument. The following example sets the timeout to 5000 milliseconds:
db.runCommand( { getlasterror: 1, w: 2, wtimeout:5000 } )
You can configure your own “default” getLastError behavior for a replica set. Use the settings.getLastErrorDefaults setting in the replica set configuration. The following sequence of commands creates a configuration that waits for the write operation to complete on a majority of the set members beforBe returning:
cfg = rs.conf()
cfg.settings = {}
cfg.settings.getLastErrorDefaults = {w: "majority"}
rs.reconfig(cfg)
The settings.getLastErrorDefaults setting affects only those getLastError commands that have no other arguments.
Note
Use of inappropriate write concern can lead to rollbacks in the case of replica set failover. Always ensure that your operations have specified the required write concern for your application.
Read preference describes how MongoDB clients route read operations to members of a replica set.
By default, an application directs its read operations to the primary member in a replica set. Reading from the primary guarantees that read operations reflect the latest version of a document. However, for an application that does not require fully up-to-date data, you can improve read throughput, or reduce latency, by distributing some or all reads to secondary members of the replica set.
The following are use cases where you might use secondary reads:
MongoDB drivers allow client applications to configure a read preference on a per-connection, per-collection, or per-operation basis. For more information about secondary read operations in the mongo shell, see the readPref() method. For more information about a driver’s read preference configuration, see the appropriate Drivers API documentation.
Note
Read preferences affect how an application selects which member to use for read operations. As a result read preferences dictate if the application receives stale or current data from MongoDB. Use appropriate write concern policies to ensure proper data replication and consistency.
If read operations account for a large percentage of your application’s traffic, distributing reads to secondary members can improve read throughput. However, in most cases sharding provides better support for larger scale operations, as clusters can distribute read and write operations across a group of machines.
New in version 2.2.
MongoDB drivers support five read preference modes:
You can specify a read preference mode on connection objects, database object, collection object, or per-operation. The syntax for specifying the read preference mode is specific to the driver and to the idioms of the host language.
Read preference modes are also available to clients connecting to a sharded cluster through a mongos. The mongos instance obeys specified read preferences when connecting to the replica set that provides each shard in the cluster.
In the mongo shell, the readPref() cursor method provides access to read preferences.
Warning
All read preference modes except primary may return stale data as secondaries replicate operations from the primary with some delay.
Ensure that your application can tolerate stale data if you choose to use a non-primary mode.
For more information, see read preference background and read preference behavior. See also the documentation for your driver.
All read operations use only the current replica set primary. This is the default. If the primary is unavailable, read operations produce an error or throw an exception.
primary read preference modes are not compatible with read preferences mode that use tag sets. If you specify a tag set with primary, the driver produces an error.
In most situations, operations read from the primary member of the set. However, if the primary is unavailable, as is the case during failover situations, operations read from secondary members.
When the read preference includes a tag set, the client reads first from the primary, if available, and then from secondaries that match the specified tags. If no secondaries have matching tags, the read operation produces an error.
Since the application may receive data from a secondary, read operations using the primaryPreferred mode may return stale data in some situations.
Warning
Changed in version 2.2: mongos added full support for read preferences.
When connecting to a mongos instance older than 2.2, using a client that supports read preference modes, primaryPreferred will send queries to secondaries.
Operations read only from the secondary members of the set. If no secondaries are available, then this read operation produces an error or exception.
Most sets have at least one secondary, but there are situations where there may be no available secondary. For example, a set with a primary, a secondary, and an arbiter may not have any secondaries if a member is in recovering state or unavailable.
When the read preference includes a tag set, the client attempts to find secondary members that match the specified tag set and directs reads to a random secondary from among the nearest group. If no secondaries have matching tags, the read operation produces an error. [2]
Read operations using the secondary mode may return stale data.
In most situations, operations read from secondary members, but in situations where the set consists of a single primary (and no other members,) the read operation will use the set’s primary.
When the read preference includes a tag set, the client attempts to find a secondary member that matches the specified tag set and directs reads to a random secondary from among the nearest group. If no secondaries have matching tags, the read operation produces an error.
Read operations using the secondaryPreferred mode may return stale data.
The driver reads from the nearest member of the set according to the member selection process. Reads in the nearest mode do not consider the member’s type. Reads in nearest mode may read from both primaries and secondaries.
Set this mode to minimize the effect of network latency on read operations without preference for current or stale data.
If you specify a tag set, the client attempts to find a secondary member that matches the specified tag set and directs reads to a random secondary from among the nearest group.
Read operations using the nearest mode may return stale data.
Note
All operations read from a member of the nearest group of the replica set that matches the specified read preference mode. The nearest mode prefers low latency reads over a member’s primary or secondary status.
For nearest, the client assembles a list of acceptable hosts based on tag set and then narrows that list to the host with the shortest ping time and all other members of the set that are within the “local threshold,” or acceptable latency. See Member Selection for more information.
| [2] | If your set has more than one secondary, and you use the secondary read preference mode, consider the following effect. If you have a three member replica set with a primary and two secondaries, and if one secondary becomes unavailable, all secondary queries must target the remaining secondary. This will double the load on this secondary. Plan and provide capacity to support this as needed. |
Tag sets allow you to specify custom read preferences so that your application can target read operations to specific members, based on custom parameters. A tag set for a read operation may resemble the following:
{ "disk": "ssd", "use": "reporting" }
To fulfill the request, a member would need to have both of these tags. Therefore the following tag sets, would satisfy this requirement:
{ "disk": "ssd", "use": "reporting" }
{ "disk": "ssd", "use": "reporting", "rack": 1 }
{ "disk": "ssd", "use": "reporting", "rack": 4 }
{ "disk": "ssd", "use": "reporting", "mem": "64"}
However, the following tag sets would not be able to fulfill this query:
{ "disk": "ssd" }
{ "use": "reporting" }
{ "disk": "ssd", "use": "production" }
{ "disk": "ssd", "use": "production", "rack": 3 }
{ "disk": "spinning", "use": "reporting", "mem": "32" }
Therefore, tag sets make it possible to ensure that read operations target specific members in a particular data center or mongod instances designated for a particular class of operations, such as reporting or analytics. For information on configuring tag sets, see Tag Sets in the Replica Set Configuration document. You can specify tag sets with the following read preference modes:
You cannot specify tag sets with the primary read preference mode.
Tags are not compatible with primary and only apply when selecting a secondary member of a set for a read operation. However, the nearest read mode, when combined with a tag set will select the nearest member that matches the specified tag set, which may be a primary or secondary.
All interfaces use the same member selection logic to choose the member to which to direct read operations, basing the choice on read preference mode and tag sets.
For more information on how read preferences modes interact with tag sets, see the documentation for each read preference mode.
Changed in version 2.2.
Connection between MongoDB drivers and mongod instances in a replica set must balance two concerns:
As a result, MongoDB drivers and mongos:
Reuse a connection to specific mongod for as long as possible after establishing a connection to that instance. This connection is pinned to this mongod.
Attempt to reconnect to a new member, obeying existing read preference modes, if connection to mongod is lost.
Reconnections are transparent to the application itself. If the connection permits reads from secondary members, after reconnecting, the application can receive two sequential reads returning from different secondaries. Depending on the state of the individual secondary member’s replication, the documents can reflect the state of your database at different moments.
Return an error only after attempting to connect to three members of the set that match the read preference mode and tag set. If there are fewer than three members of the set, the client will error after connecting to all existing members of the set.
After this error, the driver selects a new member using the specified read preference mode. In the absence of a specified read preference, the driver uses primary.
After detecting a failover situation, [3] the driver attempts to refresh the state of the replica set as quickly as possible.
| [3] | When a failover occurs, all members of the set close all client connections that produce a socket error in the driver. This behavior prevents or minimizes rollback. |
Reads from secondary may reflect the state of the data set at different points in time because secondary members of a replica set may lag behind the current state of the primary by different amounts. To prevent subsequent reads from jumping around in time, the driver can associate application threads to a specific member of the set after the first read. The thread will continue to read from the same member until:
If an application thread issues a query with the primaryPreferred mode while the primary is inaccessible, the thread will carry the association with that secondary for the lifetime of the thread. The thread will associate with the primary, if available, only after issuing a query with a different read preference, even if a primary becomes available. By extension, if a thread issues a read with the secondaryPreferred when all secondaries are down, it will carry an association with the primary. This application thread will continue to read from the primary even if a secondary becomes available later in the thread’s lifetime.
Clients, by way of their drivers, and mongos instances for sharded clusters periodically update their view of the set’s state: which members are up or down, which is primary, and the latency to each mongod instance.
For any operation that targets a member other than the primary, the driver:
Once the application selects a member of the set to use for read operations, the driver continues to use this connection for read preference until the application specifies a new read preference or something interrupts the connection. See Request Association for more information.
| [4] | Applications can configure the threshold used in this stage. The default “acceptable latency” is 15 milliseconds, which you can override in the drivers with their own secondaryAcceptableLatencyMS option. For mongos you can use the --localThreshold or localThreshold runtime options to set this value. |
Changed in version 2.2: Before version 2.2, mongos did not support the read preference mode semantics.
In most sharded clusters, a replica set provides each shard where read preferences are also applicable. Read operations in a sharded cluster, with regard to read preference, are identical to unsharded replica sets.
Unlike simple replica sets, in sharded clusters, all interactions with the shards pass from the clients to the mongos instances that are actually connected to the set members. mongos is responsible for the application of the read preferences, which is transparent to applications.
There are no configuration changes required for full support of read preference modes in sharded environments, as long as the mongos is at least version 2.2. All mongos maintain their own connection pool to the replica set members. As a result:
A request without a specified preference has primary, the default, unless, the mongos reuses an existing connection that has a different mode set.
Always explicitly set your read preference mode to prevent confusion.
All nearest and latency calculations reflect the connection between the mongos and the mongod instances, not the client and the mongod instances.
This produces the desired result, because all results must pass through the mongos before returning to the client.
Because some database commands read and return data from the database, all of the official drivers support full read preference mode semantics for the following commands:
| [5] | Only “inline” mapReduce operations that do not write data support read preference, otherwise these operations must run on the primary members. |
mongos currently does not route commands using read preferences; clients send all commands to shards’ primaries. See SERVER-7423.
You must exercise care when specifying read preference: modes other than primary can and will return stale data. These secondary queries will not include most recent write operations to the replica set’s primary. Nevertheless, there are several common use cases for using non-primary read preference modes:
Reporting and analytics workloads.
Having these queries target a secondary helps distribute load and prevent these operations from affecting the main workload of the primary.
Also consider using secondary in conjunction with a direct connection to a hidden member of the set.
Providing local reads for geographically distributed applications.
If you have application servers in multiple data centers, you may consider having a geographically distributed replica set and using a non primary read preference or the nearest to avoid network latency.
Maintaining availability during a failover.
Use primaryPreferred if you want your application to do consistent reads from the primary under normal circumstances, but to allow stale reads from secondaries in an emergency. This provides a “read-only mode” for your application during a failover.
Warning
In some situations using secondaryPreferred to distribute read load to replica sets may carry significant operational risk: if all secondaries are unavailable and your set has enough arbiters to prevent the primary from stepping down, then the primary will receive all traffic from clients.
For this reason, use secondary to distribute read load to replica sets, not secondaryPreferred.
Using read modes other than primary and primaryPreferred to provide extra capacity is not in and of itself justification for non-primary in many cases. Furthermore, sharding increases read and write capacity by distributing read and write operations across a group of machines.
This document provides a more in-depth explanation of the internals and operation of replica set features. This material is not necessary for normal operation or application development but may be useful for troubleshooting and for further understanding MongoDB’s behavior and approach.
For additional information about the internals of replication replica sets see the following resources in the MongoDB Manual:
For an explanation of the oplog, see Oplog.
Under various exceptional situations, updates to a secondary’s oplog might lag behind the desired performance time. See Replication Lag for details.
All members of a replica set send heartbeats (pings) to all other members in the set and can import operations to the local oplog from any other member in the set.
Replica set oplog operations are idempotent. The following operations require idempotency:
MongoDB uses single-master replication to ensure that the database remains consistent. However, clients may modify the read preferences on a per-connection basis in order to distribute read operations to the secondary members of a replica set. Read-heavy deployments may achieve greater query throughput by distributing reads to secondary members. But keep in mind that replication is asynchronous; therefore, reads from secondaries may not always reflect the latest writes to the primary.
See also
Note
Use db.getReplicationInfo() from a secondary member and the replication status output to asses the current state of replication and determine if there is any unintended replication delay.
Replica sets can include members with the following four special configurations that affect membership behavior:
In almost every case, replica sets simplify the process of administering database replication. However, replica sets still have a unique set of administrative requirements and concerns. Choosing the right system architecture for your data set is crucial.
See also
The Member Configurations topic in the Replica Set Administration document.
Administrators of replica sets also have unique monitoring and security concerns. The replica set functions in the mongo shell, provide the tools necessary for replica set administration. In particular use the rs.conf() to return a document that holds the replica set configuration and use rs.reconfig() to modify the configuration of an existing replica set.
Elections are the process replica set members use to select which member should become primary. A primary is the only member in the replica set that can accept write operations, including insert(), update(), and remove().
The following events can trigger an election:
In an election, all members have one vote, including hidden members, arbiters, and even recovering members. Any mongod can veto an election.
In the default configuration, all members have an equal chance of becoming primary; however, it’s possible to set priority values that weight the election. In some architectures, there may be operational reasons for increasing the likelihood of a specific replica set member becoming primary. For instance, a member located in a remote data center should not become primary. See: Member Priority for more information.
Any member of a replica set can veto an election, even if the member is a non-voting member.
A member of the set will veto an election under the following conditions:
The first member to receive votes from a majority of members in a set becomes the next primary until the next election. Be aware of the following conditions and possible situations:
| [1] | Remember that hidden and delayed imply secondary-only configuration. |
Members on either side of a network partition cannot see each other when determining whether a majority is available to hold an election.
That means that if a primary steps down and neither side of the partition has a majority on its own, the set will not elect a new primary and the set will become read only. The best practice is to have and a majority of servers in one data center and one server in another.
In order to remain up-to-date with the current state of the replica set, set members sync, or copy, oplog entries from other members.
When a new member joins a set or an existing member restarts, the member waits to receive heartbeats from other members. By default, the member syncs from the the closest member of the set that is either the primary or another secondary with more recent oplog entries. This prevents two secondaries from syncing from each other.
In version 2.0, secondaries only change sync targets if the connection between secondaries drops or produces an error.
For example:
If you have two secondary members in one data center and a primary in a second facility, and if you start all three instances at roughly the same time (i.e. with no existing data sets or oplog,) both secondaries will likely sync from the primary, as neither secondary has more recent oplog entries.
If you restart one of the secondaries, then when it rejoins the set it will likely begin syncing from the other secondary, because of proximity.
If you have a primary in one facility and a secondary in an alternate facility, and if you add another secondary to the alternate facility, the new secondary will likely sync from the existing secondary because it is closer than the primary.
See also
Note
This table is for archival purposes and does not list all features of replica sets. Always use the latest stable release of MongoDB in production deployments.
| Features | Version |
|---|---|
| Slave Delay | 1.6.3 |
| Hidden | 1.7 |
| replSetFreeze and replSetStepDown | 1.7.3 |
| Replicated ops in mongostat | 1.7.3 |
| Syncing from Secondaries | 1.8.0 |
| Authentication | 1.8.0 |
| Replication from Nearest Server (by ping Time) | 2.0 |
| replSetSyncFrom support for syncing from specific members. | 2.2 |
Additionally:
This reference collects documentation for all JavaScript methods for the mongo shell that support replica set functionality, as well as all database commands related to replication function.
See Replication, for a list of all replica set documentation.
The following methods apply to replica sets. For a complete list of all methods, see JavaScript Methods.
| Returns: | A document with status information. |
|---|
This output reflects the current status of the replica set, using data derived from the heartbeat packets sent by the other members of the replica set.
This method provides a wrapper around the replSetGetStatus database command.
See also
“Replica Set Status Reference” for documentation of this output.
Returns a status document with fields that includes the ismaster field that reports if the current node is the primary node, as well as a report of a subset of current replica set configuration.
This function provides a wrapper around the database command isMaster
| Parameters: |
|
|---|
Initiates a replica set. Optionally takes a configuration argument in the form of a document that holds the configuration of a replica set. Consider the following model of the most basic configuration for a 3-member replica set:
{
_id : <setname>,
members : [
{_id : 0, host : <host0>},
{_id : 1, host : <host1>},
{_id : 2, host : <host2>},
]
}
This function provides a wrapper around the “replSetInitiate” database command.
| Returns: | a document that contains the current replica set configuration object. |
|---|
rs.config() is an alias of rs.conf().
| Parameters: |
|
|---|
Initializes a new replica set configuration. This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
rs.reconfig() provides a wrapper around the “replSetReconfig” database command.
rs.reconfig() overwrites the existing replica set configuration. Retrieve the current configuration object with rs.conf(), modify the configuration as needed and then use rs.reconfig() to submit the modified configuration object.
To reconfigure a replica set, use the following sequence of operations:
conf = rs.conf()
// modify conf to change configuration
rs.reconfig(conf)
If you want to force the reconfiguration if a majority of the set isn’t connected to the current member, or you’re issuing the command against a secondary, use the following form:
conf = rs.conf()
// modify conf to change configuration
rs.reconfig(conf, { force: true } )
Warning
Forcing a rs.reconfig() can lead to rollback situations and other difficult to recover from situations. Exercise caution when using this option.
See also
“Replica Set Configuration” and “Replica Set Administration”.
Specify one of the following forms:
| Parameters: |
|
|---|
Provides a simple method to add a member to an existing replica set. You can specify new hosts in one of two ways:
This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
rs.add() provides a wrapper around some of the functionality of the “replSetReconfig” database command and the corresponding shell helper rs.reconfig(). See the Replica Set Configuration document for full documentation of all replica set configuration options.
Example
To add a mongod accessible on the default port 27017 running on the host mongodb3.example.net, use the following rs.add() invocation:
rs.add('mongodb3.example.net:27017')
If mongodb3.example.net is an arbiter, use the following form:
rs.add('mongodb3.example.net:27017', true)
To add mongodb3.example.net as a secondary-only member of set, use the following form of rs.add():
rs.add( { "host": "mongodbd3.example.net:27017", "priority": 0 } )
See the Replica Set Configuration and Replica Set Administration documents for more information.
| Parameters: |
|
|---|
Adds a new arbiter to an existing replica set.
This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
| Parameters: |
|
|---|---|
| Returns: | disconnects shell. |
Forces the current replica set member to step down as primary and then attempt to avoid election as primary for the designated number of seconds. Produces an error if the current node is not primary.
This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
rs.stepDown() provides a wrapper around the database command replSetStepDown.
| Parameters: |
|
|---|
Forces the current node to become ineligible to become primary for the period specified.
rs.freeze() provides a wrapper around the database command replSetFreeze.
| Parameters: |
|
|---|
Removes the node described by the hostname parameter from the current replica set. This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates negotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
Note
Before running the rs.remove() operation, you must shut down the replica set member that you’re removing.
Changed in version 2.2: This procedure is no longer required when using rs.remove(), but it remains good practice.
Provides a shorthand for the following operation:
db.getMongo().setSlaveOk()
This allows the current connection to allow read operations to run on secondary nodes. See the readPref() method for more fine-grained control over read preference in the mongo shell.
Returns a status document with fields that includes the ismaster field that reports if the current node is the primary node, as well as a report of a subset of current replica set configuration.
This function provides a wrapper around the database command isMaster
Returns a basic help text for all of the replication related shell functions.
New in version 2.2.
Provides a wrapper around the replSetSyncFrom, which allows administrators to configure the member of a replica set that the current member will pull data from. Specify the name of the member you want to sync from in the form of [hostname]:[port].
See replSetSyncFrom for more details.
The following commands apply to replica sets. For a complete list of all commands, see Command Reference.
The isMaster command provides a basic overview of the current replication configuration. MongoDB drivers and clients use this command to determine what kind of member they’re connected to and to discover additional members of a replica set. The db.isMaster() method provides a wrapper around this database command.
The command takes the following form:
{ isMaster: 1 }
This command returns a document containing the following fields:
The name of the current replica set, if applicable.
A boolean value that reports when this node is writable. If true, then the current node is either a primary node in a replica set, a master node in a master-slave configuration, of a standalone mongod.
A boolean value that, when true, indicates that the current node is a secondary member of a replica set.
An array of strings in the format of “[hostname]:[port]” listing all nodes in the replica set that are not “hidden”.
The [hostname]:[port] for the current replica set primary, if applicable.
The [hostname]:[port] of the node responding to this command.
The resync command forces an out-of-date slave mongod instance to re-synchronize itself. Note that this command is relevant to master-slave replication only. It does no apply to replica sets.
Warning
This command obtains a global write lock and will block other operations until it has completed.
The replSetFreeze command prevents a replica set member from seeking election for the specified number of seconds. Use this command in conjunction with the replSetStepDown command to make a different node in the replica set a primary.
The replSetFreeze command uses the following syntax:
{ replSetFreeze: <seconds> }
If you want to unfreeze a replica set member before the specified number of seconds has elapsed, you can issue the command with a seconds value of 0:
{ replSetFreeze: 0 }
Restarting the mongod process also unfreezes a replica set member.
replSetFreeze is an administrative command, and you must issue the it against the admin database.
The replSetGetStatus command returns the status of the replica set from the point of view of the current server. You must run the command against the admin database. The command has the following prototype format:
{ replSetGetStatus: 1 }
However, you can also run this command from the shell like so:
rs.status()
See also
“Replica Set Status Reference” and “Replication Fundamentals“
The replSetInitiate command initializes a new replica set. Use the following syntax:
{ replSetInitiate : <config_document> }
The <config_document> is a document that specifies the replica set’s configuration. For instance, here’s a config document for creating a simple 3-member replica set:
{
_id : <setname>,
members : [
{_id : 0, host : <host0>},
{_id : 1, host : <host1>},
{_id : 2, host : <host2>},
]
}
A typical way of running this command is to assign the config document to a variable and then to pass the document to the rs.initiate() helper:
config = {
_id : "my_replica_set",
members : [
{_id : 0, host : "rs1.example.net:27017"},
{_id : 1, host : "rs2.example.net:27017"},
{_id : 2, host : "rs3.example.net", arbiterOnly: true},
]
}
rs.initiate(config)
Notice that omitting the port cause the host to use the default port
of 27017. Notice also that you can specify other options in the config
documents such as the ``arbiterOnly`` setting in this example.
See also
“Replica Set Configuration,” “Replica Set Administration,” and “Replica Set Reconfiguration.”
The replSetMaintenance admin command enables or disables the maintenance mode for a secondary member of a replica set.
The command has the following prototype form:
{ replSetMaintenance: <boolean> }
Consider the following behavior when running the replSetMaintenance command:
The replSetReconfig command modifies the configuration of an existing replica set. You can use this command to add and remove members, and to alter the options set on existing members. Use the following syntax:
{ replSetReconfig: <new_config_document>, force: false }
You may also run the command using the shell’s rs.reconfig() method.
Be aware of the following replSetReconfig behaviors:
You must issue this command against the admin database of the current primary member of the replica set.
You can optionally force the replica set to accept the new configuration by specifying force: true. Use this option if the current member is not primary or if a majority of the members of the set are not accessible.
Warning
Forcing the replSetReconfig command can lead to a rollback situation. Use with caution.
Use the force option to restore a replica set to new servers with different hostnames. This works even if the set members already have a copy of the data.
A majority of the set’s members must be operational for the changes to propagate properly.
This command can cause downtime as the set renegotiates primary-status. Typically this is 10-20 seconds, but could be as long as a minute or more. Therefore, you should attempt to reconfigure only during scheduled maintenance periods.
In some cases, replSetReconfig forces the current primary to step down, initiating an election for primary among the members of the replica set. When this happens, the set will drop all current connections.
Note
replSetReconfig obtains a special mutually exclusive lock to prevent more than one :dbcommand`replSetReconfig` operation from occurring at the same time.
New in version 2.2.
| Options: |
|
|---|
replSetSyncFrom allows you to explicitly configure which host the current mongod will poll oplog entries from. This operation may be useful for testing different patterns and in situations where a set member is not syncing from the host you want. The member to sync from must be a valid source for data in the set; a member of a replica set cannot sync from:
If you attempt to sync from a member that is more than 10 seconds behind the current member, mongod will return and log a warning, but will sync from such members.
The command has the following prototype form:
{ replSetSyncFrom: "[hostname]:[port]" }
To run the command in the mongo shell, use the following invocation:
db.adminCommand( { replSetSyncFrom: "[hostname]:[port]" } )
You may also use the rs.syncFrom() helper in the mongo shell, in an operation with the following form:
rs.syncFrom("[hostname]:[port]")
Note
replSetSyncFrom provides a temporary override of default behavior. When you restart the mongod instance, if the connection that the mongod uses to sync, the mongod will revert to the default logic for selecting a sync source.
The following document describes master-slave replication, which is deprecated:
Deprecated since version 1.6: Replica sets replace master-slave replication. Use replica sets rather than master-slave replication for all new production deployments.
Replica sets provide functional super-set of master-slave and are more robust for production use. Master-slave replication preceded replica and makes it possible have a large number of non-master (i.e. slave) and to only replicate operations for a single database; however, master-slave replication provides less redundancy, and does not automate failover. See Deploy Master-Slave Equivalent using Replica Sets for a replica set configuration that is equivalent to master-slave replication.
Warning
This documentation remains to support legacy deployments and for archival purposes, only.
To configure a master-slave deployment, start two mongod instances: one in master mode, and the other in slave mode.
To start a mongod instance in master mode, invoke mongod as follows:
mongod --master --dbpath /data/masterdb/
With the --master option, the mongod will create a local.oplog.$main collection, which the “operation log” that queues operations that the slaves will apply to replicate operations from the master. The --dbpath is optional.
To start a mongod instance in slave mode, invoke mongod as follows:
mongod --slave --source <masterhostname><:<port>> --dbpath /data/slavedb/
Specify the hostname and port of the master instance to the --source argument. The --dbpath is optional.
For slave instances, MongoDB stores data about the source server in the local.sources collection.
As an alternative to specifying the --source run-time option, can add a document to local.sources specifying the master instance, as in the following operation in the mongo shell:
1 2 3 | use local
db.sources.find()
db.sources.insert( { host: <masterhostname> <,only: databasename> } );
|
In line 1, you switch context to the local database. In line 2, the find() operation should return no documents, to ensure that there are no documents in the sources collection. Finally, line 3 uses db.collection.insert() to insert the source document into the local.sources collection. The model of the local.sources document is as follows:
The host field specifies the mastermongod instance, and holds a resolvable hostname, i.e. IP address, or a name from a host file, or preferably a fully qualified domain name.
You can append <:port> to the host name if the mongod is not running on the default 27017 port.
Optional. Specify a name of a database. When specified, MongoDB will only replicate the indicated database.
Master instances store operations in an oplog which is a capped collection. As a result, if a slave falls too far behind the state of the master, it cannot “catchup” and must re-sync from scratch. Slave may become out of sync with a master if:
When slaves, are out of sync, replication stops. Administrators must intervene manually to restart replication. Use the resync command. Alternatively, the --autoresync allows a slave to restart replication automatically, after ten second pause, when the slave falls out of sync with the master. With --autoresync specified, the slave will only attempt to re-sync once in a ten minute period.
To prevent these situations you should specify the a larger oplog when you start the master instance, by adding the --oplogSize option when starting mongod. If you do not specify --oplogSize, mongod will allocate 5% of available disk space on start up to the oplog, with a minimum of 1GB for 64bit machines and 50MB for 32bit machines.
MongoDB provides a number of run time configuration options for mongod instances in master-slave deployments. You can specify these options in configuration files or on the command-line. See documentation of the following:
Also consider the Master-Slave Replication Command Line Options for related options.
On a master instance, issue the following operation in the mongo shell to return replication status from the perspective of the master:
db.printReplicationInfo()
On a slave instance, use the following operation in the mongo shell to return the replication status from the perspective of the slave:
db.printSlaveReplicationInfo()
Use the serverStatus as in the following operation, to return status of the replication:
db.serverStatus()
See server status repl fields for documentation of the relevant section of output.
When running with auth enabled, in master-slave deployments, you must create a user account for the local database on both mongod instances. Log in, and authenticate to the admin database on the slave instance, and then create the repl user on the local database, with the following operation:
use local db.addUser(‘repl’, <replpassword>)
Once created, repeat the operation on the master instance.
The slave instance first looks for a user named repl in the local.system.users collection. If present, the slave uses this user account to authenticate to the local database in the master instance. If the repl user does not exist, the slave instance attempts to authenticate using the first user document in the local.system.users collection.
The local database works like the admin database: an account for local has access to the entire server.
See also
Security for more information about security in mongodb
If you want a replication configuration that resembles master-slave replication, using replica sets replica sets, consider the following replica configuration document. In this deployment hosts <master> and <slave> [1] provide replication that is roughly equivalent to a two-instance master-slave deployment:
{
_id : 'setName',
members : [
{ _id : 0, host : "<master>", priority : 1 },
{ _id : 1, host : "<slave>", priority : 0, votes : 0 }
]
}
See Replica Set Configuration for more information about replica set configurations.
| [1] | In replica set configurations, the host field must hold a resolvable hostname. |
To permanently failover from a unavailable or damaged master (A in the following example) to a slave (B):
Shut down A.
Stop mongod on B.
Back up and move all data files that begin with local on B from the dbpath.
Warning
Removing local.* is irrevocable and cannot be undone. Perform this step with extreme caution.
Restart mongod on B with the --master option.
Note
This is a one time operation, and is not reversible. A cannot become a slave of B until it completes a full resync.
If you have a master (A) and a slave (B) and you would like to reverse their roles, follow this procedure. The procedure assumes A is healthy, up-to-date and available.
If A is not healthy but the hardware is okay (power outage, server crash, etc.), skip steps 1 and 2 and in step 8 replace all of A‘s files with B‘s files in step 8.
If A is not healthy and the hardware is not okay, replace A with a new machine. Also follow the instructions in the previous paragraph.
To invert the master and slave in a deployment:
Halt writes on A using the fsync command.
Make sure B is up to date with the state of A.
Shut down B.
Back up and move all data files that begin with local on B from the dbpath to remove the existing local.sources data.
Warning
Removing local.* is irrevocable and cannot be undone. Perform this step with extreme caution.
Start B with the --master option.
Do a write on B, which primes the oplog to provide a new sync start point.
Shut down B. B will now have a new set of data files that start with local.
Shut down A and replace all files in the dbpath of A that start with local with a copy of the files in the dbpath of B that begin with local.
Considering compressing the local files from B while you copy them, as they may be quite large.
Start B with the --master option.
Start A with all the usual slave options, but include fastsync.
If you can stop write operations to the master for an indefinite period, you can copy the data files from the master to the new slave and then start the slave with --fastsync.
Warning
Be careful with --fastsync. If the data on both instances is identical, a discrepancy will exist forever.
fastsync is a way to start a slave by starting with an existing master disk image/backup. This option declares that the administrator guarantees the image is correct and completely up-to-date with that of the master. If you have a full and complete copy of data from a master you can use this option to avoid a full synchronization upon starting the slave.
You can just copy the other slave’s data file snapshot without any special options. Only take data snapshots when a mongod process is down or locked using db.fsyncLock().
Slaves asynchronously apply write operations from the master that the slaves poll from the master’s oplog. The oplog is finite in length, and if a slave is too far behind, a full resync will be necessary. To resync the slave, connect to a slave using the mongo and issue the resync command:
use admin
db.runCommand( { resync: 1 } )
This forces a full resync of all data (which will be very slow on a large database). You can achieve the same effect by stopping mongod on the slave, deleting the entire content of the dbpath on the slave, and restarting the mongod.
Slaves cannot be “chained.” They must all connect to the master directly.
If a slave attempts “slave from” another slave you will see the following line in the mongod long of the shell:
assertion 13051 tailable cursor requested on non capped collection ns:local.oplog.$main
To change a slave’s source, manually modify the slave’s local.sources collection.
Example
Consider the following: If you accidentally set an incorrect hostname for the slave’s source, as in the following example:
mongod --slave --source prod.mississippi
You can correct this, by restarting the slave without the --slave and --source arguments:
mongod
Connect to this mongod instance using the mongo shell and update the local.sources collection, with the following operation sequence:
use local
db.sources.update( { host : "prod.mississippi" }, { $set : { host : "prod.mississippi.example.net" } } )
Restart the slave with the correct command line arguments or with no --source option. After configuring local.sources the first time, the --source will have no subsequent effect. Therefore, both of the following invocations are correct:
mongod --slave --source prod.mississippi.example.net
or
mongod --slave
The slave now polls data from the correct master.
The following tutorials describe certain replica set maintenance operations in detail:
This tutorial describes how to create a three member replica set from three existing instances of MongoDB. The tutorial provides one procedure for development and test systems and a separate procedure for production systems.
To deploy a replica set from a single standalone MongoDB instance, see Convert a Standalone to a Replica Set.
For background information on replica set deployments, see Replication Fundamentals and Replication Architectures.
Three member replica sets provide enough redundancy to survive most network partitions and other system failures. Additionally, these sets have sufficient capacity for many distributed read operations. Most deployments require no additional members or configuration.
A replica set requires three distinct systems so that each system can run its own instance of mongod. For development systems you can run all three instances of the mongod process on a local system. (e.g. a laptop) or within a virtual instance. For production environments, you should endeavor to maintain as much separation between the members as possible. For example, when using VMs in Production, each member should live on separate host servers, served by redundant power circuits, and with redundant network paths.
These procedures assume you already have instances of MongoDB installed on the systems you will add as members of your replica set. If you have not already installed MongoDB, see the installation tutorials.
The examples in this procedure create a new replica set named rs0.
Before creating your replica set, verify that every member can successfully connect to every other member. The network configuration must allow all possible connections between any two members. To test connectivity, see Test Connections Between all Members.
Start three instances of mongod as members of a replica set named rs0, as described in this step. For ephemeral tests and the purposes of this guide, you may run the mongod instances in separate windows of GNU Screen. OS X and most Linux distributions come with screen installed by default [1] systems.
Create the necessary data directories by issuing a command similar to the following:
mkdir -p /srv/mongodb/rs0-0 /srv/mongodb/rs0-1 /srv/mongodb/rs0-2
Issue the following commands, each in a distinct screen window:
mongod --port 27017 --dbpath /srv/mongodb/rs0-0 --replSet rs0
mongod --port 27018 --dbpath /srv/mongodb/rs0-1 --replSet rs0
mongod --port 27019 --dbpath /srv/mongodb/rs0-2 --replSet rs0
This starts each instance as a member of a replica set named rs0, each running on a distinct port. If you are already using these ports, you can select different ports. See the documentation of the following options for more information: --port, --dbpath, and --replSet.
Open a mongo shell and connect to the first mongod instance. If you’re running this command remotely, replace “localhost” with the appropriate hostname. In a new shell session, enter the following:
mongo localhost:27017
Note
In a replica set, all members of the set must use localhost addresses or no members of the replica set can use the localhost addresses.
Use rs.initiate() to initiate a replica set consisting of the current member and using the default configuration:
rs.initiate()
Display the current replica configuration:
rs.conf()
Add two members to the replica set by issuing a sequence of commands similar to the following.
rs.add("localhost:27018")
rs.add("localhost:27019")
After these commands return you have a fully functional replica set. New replica sets elect a primary within a seconds.
Check the status of your replica set at any time with the rs.status() operation.
See also
The documentation of the following shell functions for more information:
You may also consider the simple setup script as an example of a basic automatically configured replica set.
| [1] | GNU Screen is packaged as screen on Debian-based, Fedora/Red Hat-based, and Arch Linux. |
Production replica sets are very similar to the development or testing deployment described above, with the following differences:
Each member of the replica set resides on its own machine, and the MongoDB processes all bind to port 27017, which is the standard MongoDB port.
Each member of the replica set must be accessible by way of resolvable DNS or hostnames in the following scheme:
Configure DNS names appropriately, or set up your systems’ /etc/host file to reflect this configuration.
You specify run-time configuration on each system in a configuration file stored in /etc/mongodb.conf or in a related location. You do not specify run-time configuration through command line options.
For each MongoDB instance, use the following configuration. Set configuration values appropriate to your systems:
port = 27017
bind_ip = 10.8.0.10
dbpath = /srv/mongodb/
fork = true
replSet = rs0
You do not need to specify an interface with bind_ip. However, if you do not specify an interface, MongoDB listens for connections on all available IPv4 interfaces. Modify bind_ip to reflect a secure interface on your system that is able to access all other members of the set and on which all other members of the replica set can access the current member. The DNS or host names must point and resolve to this IP address. Configure network rules or a virtual private network (i.e. “VPN”) to permit this access.
For more documentation on run time options used above and on additional configuration options, see Configuration File Options.
To deploy a production replica set:
Before creating your replica set, verify that every member can successfully connect to every other member. The network configuration must allow all possible connections between any two members. To test connectivity, see Test Connections Between all Members.
On each system start the mongod process by issuing a command similar to following:
mongod --config /etc/mongodb.conf
Note
In production deployments you likely want to use and configure a control script to manage this process based on this command. Control scripts are beyond the scope of this document.
Open a mongo shell connected to this host:
mongo
Use rs.initiate() to initiate a replica set consisting of the current member and using the default configuration:
rs.initiate()
Display the current replica configuration:
rs.conf()
Add two members to the replica set by issuing a sequence of commands similar to the following.
rs.add("mongodb1.example.net")
rs.add("mongodb2.example.net")
After these commands return you have a fully functional replica set. New replica sets elect a primary within a seconds.
Check the status of your replica set at any time with the rs.status() operation.
See also
The documentation of the following shell functions for more information:
While standalone MongoDB instances are useful for testing, development and trivial deployments, for production use, replica sets provide required robustness and disaster recovery. This tutorial describes how to convert an existing standalone instance into a three-member replica set. If you’re deploying a replica set “fresh,” without any existing MongoDB data or instance, see Deploy a Replica Set.
For more information on replica sets, their use, and administration, see:
Note
If you’re converting a standalone instance into a replica set that is a shard in a sharded cluster you must change the shard host information in the config database. While connected to a mongos instance with a mongo shell, issue a command in the following form:
db.getSiblingDB("config").shards.save( {_id: "<name>", host: "<replica-set>/<member,><member,><...>" } )
Replace <name> with the name of the shard, replace <replica-set> with the name of the replica set, and replace <member,><member,><> with the list of the members of the replica set.
After completing this operation you must restart all mongos instances. When possible you should restart all components of the replica sets (i.e. all mongos and all shard mongod instances.)
This procedure assumes you have a standalone instance of MongoDB installed. If you have not already installed MongoDB, see the installation tutorials.
Shut down the your MongoDB instance and then restart using the --replSet option and the name of the replica set, which is rs0 in the example below.
Use a command similar to the following:
mongod --port 27017 --replSet rs0
This starts the instance as a member of a replica set named rs0. For more information on configuration options, see Configuration File Options and the mongod.
Open a mongo shell and connect to the mongod instance. In a new system shell session, use the following command to start a mongo shell:
mongo
Use rs.initiate() to initiate the replica set:
rs.initiate()
The set is now operational. To return the replica set configuration, call the rs.conf() method. To check the status of the replica set, use rs.status().
Now add additional replica set members. On two distinct systems, start two new standalone mongod instances. Then, in the mongo shell instance connected to the first mongod instance, issue a command in the following form:
rs.add("<hostname>:<port>")
Replace <hostname> and <port> with the resolvable hostname and port of the mongod instance you want to add to the set. Repeat this operation for each mongod that you want to add to the set.
For more information on adding hosts to a replica set, see the Add Members to a Replica Set document.
This tutorial explains how to add an additional member to an existing replica set.
Before adding a new member, see the Adding Members topic in the Replica Set Administration document.
For background on replication deployment patterns, see the Replication Architectures document.
If neither of these conditions are satisfied, please use the MongoDB installation tutorial and the Deploy a Replica Set tutorial instead.
The examples in this procedure use the following configuration:
port = 27017
bind_ip = 10.8.0.10
dbpath = /srv/mongodb/
logpath = /var/log/mongodb.log
fork = true
replSet = rs0
For more information on configuration options, see Configuration File Options.
This procedure uses the above example configuration.
Deploy a new mongod instance, specifying the name of the replica set. You can do this one of two ways:
Using the mongodb.conf file. On the primary, issue a command that resembles the following:
mongod --config /etc/mongodb.conf
Using command line arguments. On the primary, issue command that resembles the following:
mongod --replSet rs0
Take note of the host name and port information for the new mongod instance.
Open a mongo shell connected to the replica set’s primary:
mongo
Note
The primary is the only member that can add or remove members from the replica set. If you do not know which member is the primary, log into any member of the replica set using mongo and issue the db.isMaster() command to determine which member is in the isMaster.primary field. For example:
mongo mongodb0.example.net
db.isMaster()
If you are not connected to the primary, disconnect from the current client and reconnect to the primary.
In the mongo shell, issue the following command to add the new member to the replica set.
rs.add("mongodb3.example.net")
Note
You can also include the port number, depending on your setup:
rs.add("mongodb3.example.net:27017")
Verify that the member is now part of the replica set by calling the rs.conf() method, which displays the replica set configuration:
rs.conf()
You can use the rs.status() function to provide an overview of replica set status.
Alternately, you can add a member to a replica set by specifying an entire configuration document with some or all of the fields in a members document. For example:
rs.add({_id: 1, host: "mongodb3.example.net:27017", priority: 0, hidden: true})
This configures a hidden member that is accessible at mongodb3.example.net:27017. See host, priority, and hidden for more information about these settings. When you specify a full configuration object with rs.add(), you must declare the _id field, which is not automatically populated in this case.
This tutorial describes how to deploy a replica set with members in multiple locations. The tutorial addresses three-member sets, four-member sets, and sets with more than four members.
See also
For appropriate background, see Replication Fundamentals and Replication Architectures. For related tutorials, see Deploy a Replica Set and Add Members to a Replica Set.
While replica sets provide basic protection against single-instance failure, when all of the members of a replica set reside within a single facility, the replica set is still susceptible to some classes of errors within that facility including power outages, networking distortions, and natural disasters. To protect against these classes of failures, deploy a replica set with one or more members in a geographically distinct facility or data center.
For a three-member replica set you need two instances in a primary facility (hereafter, “Site A”) and one member in a secondary facility (hereafter, “Site B”.) Site A should be the same facility or very close to your primary application infrastructure (i.e. application servers, caching layer, users, etc.)
For a four-member replica set you need two members in Site A, two members in Site B (or one member in Site B and one member in Site C,) and a single arbiter in Site A.
For replica sets with additional members in the secondary facility or with multiple secondary facilities, the requirements are the same as above but with the following notes:
For all configurations in this tutorial, deploy each replica set member on a separate system. Although you may deploy more than one replica set member on a single system, doing so reduces the redundancy and capacity of the replica set. Such deployments are typically for testing purposes and beyond the scope of this tutorial.
A geographically distributed three-member deployment has the following features:
Each member of the replica set resides on its own machine, and the MongoDB processes all bind to port 27017, which is the standard MongoDB port.
Each member of the replica set must be accessible by way of resolvable DNS or hostnames in the following scheme:
Configure DNS names appropriately, or set up your systems’ /etc/host file to reflect this configuration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other systems in Site A.
Ensure that network traffic can pass between all members in the network securely and efficiently. Consider the following:
Establish a virtual private network between the systems in Site A and Site B to encrypt all traffic between the sites and remains private. Ensure that your network topology routes all traffic between members within a single site over the local area network.
Configure authentication using auth and keyFile, so that only servers and process with authentication can connect to the replica set.
Configure networking and firewall rules so that only traffic (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment.
See also
For more information on security and firewalls, see Security Considerations for Replica Sets.
Specify run-time configuration on each system in a configuration file stored in /etc/mongodb.conf or in a related location. Do not specify run-time configuration through command line options.
For each MongoDB instance, use the following configuration, with values set appropriate to your systems:
port = 27017
bind_ip = 10.8.0.10
dbpath = /srv/mongodb/
fork = true
replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net
Modify bind_ip to reflect a secure interface on your system that is able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Configure network rules or a virtual private network (i.e. “VPN”) to permit this access.
Note
The portion of the replSet following the / provides a “seed list” of known members of the replica set. mongod uses this list to fetch configuration changes following restarts. It is acceptable to omit this section entirely, and have the replSet option resemble:
replSet = rs0
For more documentation on the above run time configurations, as well as additional configuration options, see Configuration File Options.
To deploy a geographically distributed three-member set:
On each system start the mongod process by issuing a command similar to following:
mongod --config /etc/mongodb.conf
Note
In production deployments you likely want to use and configure a control script to manage this process based on this command. Control scripts are beyond the scope of this document.
Open a mongo shell connected to this host:
mongo
Use rs.initiate() to initiate a replica set consisting of the current member and using the default configuration:
rs.initiate()
Display the current replica configuration:
rs.conf()
Add the remaining members to the replica set by issuing a sequence of commands similar to the following. The example commands assume the current primary is mongodb0.example.net:
rs.add("mongodb1.example.net")
rs.add("mongodb2.example.net")
Make sure that you have configured the member located in Site B (i.e. mongodb2.example.net) as a secondary-only member:
Issue the following command to determine the members[n]._id value for mongodb2.example.net:
rs.conf()
In the member array, save the members[n]._id value. The example in the next step assumes this value is 2.
In the mongo shell connected to the replica set’s primary, issue a command sequence similar to the following:
cfg = rs.conf()
cfg.members[2].priority = 0
rs.reconfig(cfg)
Note
In some situations, the rs.reconfig() shell method can force the current primary to step down and causes an election. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods.
After these commands return you have a geographically distributed three-member replica set.
To check the status of your replica set, issue rs.status().
See also
The documentation of the following shell functions for more information:
A geographically distributed four-member deployment has the following features:
Each member of the replica set, except for the arbiter (see below), resides on its own machine, and the MongoDB processes all bind to port 27017, which is the standard MongoDB port.
Each member of the replica set must be accessible by way of resolvable DNS or hostnames in the following scheme:
Configure DNS names appropriately, or set up your systems’ /etc/host file to reflect this configuration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other systems in Site A.
One host (e.g. mongodb3.example.net) will be an arbiter and can run on a system that is also used for an application server or some other shared purpose.
There are three possible architectures for this replica set:
In most cases the first architecture is preferable because it is the least complex.
Ensure that network traffic can pass between all members in the network securely and efficiently. Consider the following:
Establish a virtual private network between the systems in Site A and Site B (and Site C if it exists) to encrypt all traffic between the sites and remains private. Ensure that your network topology routes all traffic between members within a single site over the local area network.
Configure authentication using auth and keyFile, so that only servers and process with authentication can connect to the replica set.
Configure networking and firewall rules so that only traffic (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment.
See also
For more information on security and firewalls, see Security Considerations for Replica Sets.
Specify run-time configuration on each system in a configuration file stored in /etc/mongodb.conf or in a related location. Do not specify run-time configuration through command line options.
For each MongoDB instance, use the following configuration, with values set appropriate to your systems:
port = 27017
bind_ip = 10.8.0.10
dbpath = /srv/mongodb/
fork = true
replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net,mongodb3.example.net
Modify bind_ip to reflect a secure interface on your system that is able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Configure network rules or a virtual private network (i.e. “VPN”) to permit this access.
Note
The portion of the replSet following the / provides a “seed list” of known members of the replica set. mongod uses this list to fetch configuration changes following restarts. It is acceptable to omit this section entirely, and have the replSet option resemble:
replSet = rs0
For more documentation on the above run time configurations, as well as additional configuration options, see doc:/reference/configuration-options.
To deploy a geographically distributed four-member set:
On each system start the mongod process by issuing a command similar to following:
mongod --config /etc/mongodb.conf
Note
In production deployments you likely want to use and configure a control script to manage this process based on this command. Control scripts are beyond the scope of this document.
Open a mongo shell connected to this host:
mongo
Use rs.initiate() to initiate a replica set consisting of the current member and using the default configuration:
rs.initiate()
Display the current replica configuration:
rs.conf()
Add the remaining members to the replica set by issuing a sequence of commands similar to the following. The example commands assume the current primary is mongodb0.example.net:
rs.add("mongodb1.example.net")
rs.add("mongodb2.example.net")
rs.add("mongodb3.example.net")
In the same shell session, issue the following command to add the arbiter (e.g. mongodb4.example.net):
rs.addArb("mongodb4.example.net")
Make sure that you have configured each member located in Site B (e.g. mongodb3.example.net) as a secondary-only member:
Issue the following command to determine the members[n]._id value for the member:
rs.conf()
In the member array, save the members[n]._id value. The example in the next step assumes this value is 2.
In the mongo shell connected to the replica set’s primary, issue a command sequence similar to the following:
cfg = rs.conf()
cfg.members[2].priority = 0
rs.reconfig(cfg)
Note
In some situations, the rs.reconfig() shell method can force the current primary to step down and causes an election. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods.
After these commands return you have a geographically distributed four-member replica set.
To check the status of your replica set, issue rs.status().
See also
The documentation of the following shell functions for more information:
The procedure for deploying a geographically distributed set with more than four members is similar to the above procedures, with the following differences:
The oplog exists internally as a capped collection, so you cannot modify its size in the course of normal operations. In most cases the default oplog size is an acceptable size; however, in some situations you may need a larger or smaller oplog. For example, you might need to change the oplog size if your applications perform large numbers of multi-updates or deletes in short periods of time.
This tutorial describes how to resize the oplog. For a detailed explanation of oplog sizing, see the Oplog topic in the Replication Fundamentals document. For details on the how oplog size affects delayed members and affects replication lag, see the Delayed Members topic and the Check the Replication Lag topic in Replica Set Administration.
The following is an overview of the procedure for changing the size of the oplog:
The examples in this procedure use the following configuration:
To change the size of the oplog for a replica set, use the following procedure for every member of the set that may become primary.
Shut down the mongod instance and restart it in “standalone” mode running on a different port.
Note
Shutting down the primary member of the set will trigger a failover situation and another member in the replica set will become primary. In most cases, it is least disruptive to modify the oplogs of all the secondaries before modifying the primary.
To shut down the current primary instance, use a command that resembles the following:
mongod --dbpath /srv/mongodb --shutdown
To restart the instance on a different port and in “standalone” mode (i.e. without replSet or --replSet), use a command that resembles the following:
mongod --port 37017 --dbpath /srv/mongodb
Backup the existing oplog on the standalone instance. Use the following sequence of commands:
mongodump --db local --collection 'oplog.rs' --port 37017
Connect to the instance using the mongo shell:
mongo --port 37017
Save the last entry from the old (current) oplog.
In the mongo shell, enter the following command to use the local database to interact with the oplog:
use local
Use the db.collection.save() operation to save the last entry in the oplog to a temporary collection:
db.temp.save( db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next() )
You can see this oplog entry in the temp collection by issuing the following command:
db.temp.find()
Drop the old oplog.rs collection in the local database. Use the following command:
db.oplog.rs.drop()
This will return ``true`` on the shell.
Use the create command to create a new oplog of a different size. Specify the size argument in bytes. A value of 2147483648 will create a new oplog that’s 2 gigabytes:
db.runCommand( { create : "oplog.rs", capped : true, size : 2147483648 } )
Upon success, this command returns the following status:
{ "ok" : 1 }
Insert the previously saved last entry from the old oplog into the new oplog:
db.oplog.rs.save( db.temp.findOne() )
To confirm the entry is in the new oplog, issue the following command:
db.oplog.rs.find()
Restart the server as a member of the replica set on its usual port:
mongod --dbpath /srv/mongodb --shutdown
mongod --replSet rs0 --dbpath /srv/mongodb
The replica member will recover and “catch up” and then will be eligible for election to primary. To step down the “temporary” primary that took over when you initially shut down the server, use the rs.stepDown() method. This will force an election for primary. If the server’s priority is higher than all other members in the set and if it has successfully “caught up,” then it will likely become primary.
Repeat this procedure for all other members of the replica set that are or could become primary.
You can force a replica set member to become primary by giving it a higher members[n].priority value than any other member in the set.
Optionally, you also can force a member never to become primary by setting its members[n].priority value to 0, which means the member can never seek election as primary. For more information, see Secondary-Only Members.
Changed in version 2.0.
For more information on priorities, see Member Priority.
This procedure assumes your current primary is m1.example.net and that you’d like to instead make m3.example.net primary. The procedure also assumes you have a three-member replica set with the configuration below. For more information on configurations, see Replica Set Configuration Use.
This procedure assumes this configuration:
{
"_id" : "rs",
"version" : 7,
"members" : [
{
"_id" : 0,
"host" : "m1.example.net:27017"
},
{
"_id" : 1,
"host" : "m2.example.net:27017"
},
{
"_id" : 2,
"host" : "m3.example.net:27017"
}
]
}
In the mongo shell, use the following sequence of operations to make m3.example.net the primary:
cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 0.5
cfg.members[2].priority = 1
rs.reconfig(cfg)
This sets m3.example.net to have a higher members[n].priority value than the other mongod instances.
The following sequence of events occur:
Optionally, if m3.example.net is more than 10 seconds behind m1.example.net‘s optime, and if you don’t need to have a primary designated within 10 seconds, you can force m1.example.net to step down by running:
db.adminCommand({replSetStepDown:1000000, force:1})
This prevents m1.example.net from being primary for 1,000,000 seconds, even if there is no other member that can become primary. When m3.example.net catches up with m1.example.net it will become primary.
If you later want to make m1.example.net primary again while it waits for m3.example.net to catch up, issue the following command to make m1.example.net seek election again:
rs.freeze()
The rs.freeze() provides a wrapper around the replSetFreeze database command.
Changed in version 1.8.
Consider a replica set with the following members:
To force a member to become primary use the following procedure:
In a mongo shell, run rs.status() to ensure your replica set is running as expected.
In a mongo shell connected to the mongod instance running on mdb2.example.net, freeze mdb2.example.net so that it does not attempt to become primary for 120 seconds.
rs.freeze(120)
In a mongo shell connected the mongod running on mdb0.example.net, step down this instance that the mongod is not eligible to become primary for 120 seconds:
rs.stepDown(120)
mdb1.example.net becomes primary.
Note
During the transition, there is a short window where the set does not have a primary.
For more information, consider the rs.freeze() and rs.stepDown() methods that wrap the replSetFreeze and replSetStepDown commands.
For most replica sets the hostnames [1] in the members[n].host field never change. However, in some cases you must migrate some or all host names in a replica set as organizational needs change. This document presents two possible procedures for changing the hostnames in the members[n].host field. Depending on your environments availability requirements, you may:
Make the configuration change without disrupting the availability of the replica set. While this ensures that your application will always be able to read and write data to the replica set, this procedure can take a long time and may incur downtime at the application layer. [2]
For this procedure, see Changing Hostnames while Maintaining the Replica Set’s Availability.
Stop all members of the replica set at once running on the “old” hostnames or interfaces, make the configuration changes, and then start the members at the new hostnames or interfaces. While the set will be totally unavailable during the operation, the total maintenance window is often shorter.
For this procedure, see Changing All Hostnames in Replica Set at Once.
See also
And the following tutorials:
| [1] | Always use resolvable hostnames for the value of the members[n].host field in the replica set configuration to avoid confusion and complexity. |
| [2] | You will have to configure your applications so that they can connect to the replica set at both the old and new locations. This often requires a restart and reconfiguration at the application layer, which may affect the availability of your applications. This re-configuration is beyond the scope of this document and makes the second option preferable when you must change the hostnames of all members of the replica set at once. |
Given a replica set with three members:
And with the following rs.conf() output:
{
"_id" : "rs",
"version" : 3,
"members" : [
{
"_id" : 0,
"host" : "database0.example.com:27017"
},
{
"_id" : 1,
"host" : "database1.example.com:27017"
},
{
"_id" : 2,
"host" : "database2.example.com:27017"
}
]
}
The following procedures change the members’ hostnames as follows:
Use the most appropriate procedure for your deployment.
This procedure uses the above assumptions.
For each secondary in the replica set, perform the following sequence of operations:
Stop the secondary.
Restart the secondary at the new location.
Open a mongo shell connected to the replica set’s primary. In our example, the primary runs on port 27017 so you would issue the following command:
mongo --port 27017
Run the following reconfigure option, for the members[n].host value where n is 1:
cfg = rs.conf()
cfg.members[1].host = "mongodb1.example.net:27017"
rs.reconfig(cfg)
See Replica Set Configuration for more information.
Make sure your client applications are able to access the set at the new location and that the secondary has a chance to catch up with the other members of the set.
Repeat the above steps for each non-primary member of the set.
Open a mongo shell connected to the primary and step down the primary using replSetStepDown. In the mongo shell, use the rs.stepDown() wrapper, as follows:
rs.stepDown()
When the step down succeeds, shut down the primary.
To make the final configuration change, connect to the new primary in the mongo shell and reconfigure the members[n].host value where n is 0:
cfg = rs.conf()
cfg.members[0].host = "mongodb0.example.net:27017"
rs.reconfig(cfg)
Start the original primary.
Open a mongo shell connected to the primary.
To confirm the new configuration, call rs.conf() in the mongo shell.
Your output should resemble:
{
"_id" : "rs",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:27017"
},
{
"_id" : 1,
"host" : "mongodb1.example.net:27017"
},
{
"_id" : 2,
"host" : "mongodb2.example.net:27017"
}
]
}
This procedure uses the above assumptions.
Stop all members in the replica set.
Restart each member on a different port and without using the --replSet run-time option. Changing the port number during maintenance prevents clients from connecting to this host while you perform maintenance. Use the member’s usual --dbpath, which in this example is /data/db1. Use a command that resembles the following:
mongod --dbpath /data/db1/ --port 37017
For each member of the replica set, perform the following sequence of operations:
Open a mongo shell connected to the mongod running on the new, temporary port. For example, for a member running on a temporary port of 37017, you would issue this command:
mongo --port 37017
Edit the replica set configuration manually. The replica set configuration is the only document in the system.replset collection in the local database. Edit the replica set configuration with the new hostnames and correct ports for all the members of the replica set. Consider the following sequence of commands to change the hostnames in a three-member set:
use local
cfg = db.system.replset.findOne( { "_id": "rs" } )
cfg.members[0].host = "mongodb0.example.net:27017"
cfg.members[1].host = "mongodb1.example.net:27017"
cfg.members[2].host = "mongodb2.example.net:27017"
db.system.replset.update( { "_id": "rs" } , cfg )
Stop the mongod process on the member.
After re-configuring all members of the set, start each mongod instance in the normal way: use the usual port number and use the --replSet option. For example:
mongod --dbpath /data/db1/ --port 27017 --replSet rs
Connect to one of the mongod instances using the mongo shell. For example:
mongo --port 27017
To confirm the new configuration, call rs.conf() in the mongo shell.
Your output should resemble:
{
"_id" : "rs",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:27017"
},
{
"_id" : 1,
"host" : "mongodb1.example.net:27017"
},
{
"_id" : 2,
"host" : "mongodb2.example.net:27017"
}
]
}
If you have a secondary in a replica set that no longer needs to hold a copy of the data but that you want to retain in the set to ensure that the replica set will be able to elect a primary, you can convert the secondary into an arbiter. This document provides two equivalent procedures for this process.
Both of the following procedures are operationally equivalent. Choose whichever procedure you are most comfortable with:
You may operate the arbiter on the same port as the former secondary. In this procedure, you must shut down the secondary and remove its data before restarting and reconfiguring it as an arbiter.
For this procedure, see Convert a Secondary to an Arbiter and Reuse the Port Number.
Run the arbiter on a new port. In this procedure, you can reconfigure the server as an arbiter before shutting down the instance running as a secondary.
For this procedure, see Convert a Secondary to an Arbiter Running on a New Port Number.
If your application is connecting directly to the secondary, modify the application so that MongoDB queries don’t reach the secondary.
Shut down the secondary.
Remove the secondary from the replica set by calling the rs.remove() method. Perform this operation while connected to the current primary in the mongo shell:
rs.remove("<hostname>:<port>")
Verify that the replica set no longer includes the secondary by calling the rs.conf() method in the mongo shell:
rs.conf()
Move the secondary’s data directory to an archive folder. For example:
mv /data/db /data/db-old
Optional
You may remove the data instead.
Create a new, empty data directory to point to when restarting the mongod instance. You can reuse the previous name. For example:
mkdir /data/db
Restart the mongod instance for the secondary, specifying the port number, the empty data directory, and the replica set. You can use the same port number you used before. Issue a command similar to the following:
mongod --port 27021 --dbpath /data/db --replSet rs
In the mongo shell convert the secondary to an arbiter using the rs.addArb() method:
rs.addArb("<hostname>:<port>")
Verify the arbiter belongs to the replica set by calling the rs.conf() method in the mongo shell.
rs.conf()
The arbiter member should include the following:
"arbiterOnly" : true
If your application is connecting directly to the secondary or has a connection string referencing the secondary, modify the application so that MongoDB queries don’t reach the secondary.
Create a new, empty data directory to be used with the new port number. For example:
mkdir /data/db-temp
Start a new mongod instance on the new port number, specifying the new data directory and the existing replica set. Issue a command similar to the following:
mongod --port 27021 --dbpath /data/db-temp --replSet rs
In the mongo shell connected to the current primary, convert the new mongod instance to an arbiter using the rs.addArb() method:
rs.addArb("<hostname>:<port>")
Verify the arbiter has been added to the replica set by calling the rs.conf() method in the mongo shell.
rs.conf()
The arbiter member should include the following:
"arbiterOnly" : true
Shut down the secondary.
Remove the secondary from the replica set by calling the rs.remove() method in the mongo shell:
rs.remove("<hostname>:<port>")
Verify that the replica set no longer includes the old secondary by calling the rs.conf() method in the mongo shell:
rs.conf()
Move the secondary’s data directory to an archive folder. For example:
mv /data/db /data/db-old
Optional
You may remove the data instead.
If MongoDB does not shutdown cleanly [1] the on-disk representation of the data files will likely reflect an inconsistent state which could lead to data corruption. [2]
To prevent data inconsistency and corruption, always shut down the database cleanly and use the durability journaling. The journal writes data to disk every 100 milliseconds by default and ensures that MongoDB can recover to a consistent state even in the case of an unclean shutdown due to power loss or other system failure.
If you are not running as part of a replica set and do not have journaling enabled, use the following procedure to recover data that may be in an inconsistent state. If you are running as part of a replica set, you should always restore from a backup or restart the mongod instance with an empty dbpath and allow MongoDB to resync the data.
See also
The Administration documents, including Replica Set Syncing, and the documentation on the repair, repairpath, and journal settings.
| [1] | To ensure a clean shut down, use the mongod --shutdown option, your control script, “Control-C” (when running mongod in interactive mode,) or kill $(pidof mongod) or kill -2 $(pidof mongod). |
| [2] | You can also use the db.collection.validate() method to test the integrity of a single collection. However, this process is time consuming, and without journaling you can safely assume that the data is in an invalid state and you should either run the repair operation or resync from an intact member of the replica set. |
When you are aware of a mongod instance running without journaling that stops unexpectedly and you’re not running with replication, you should always run the repair operation before starting MongoDB again. If you’re using replication, then restore from a backup and allow replication to synchronize your data.
If the mongod.lock file in the data directory specified by dbpath, /data/db by default, is not a zero-byte file, then mongod will refuse to start, and you will find a message that contains the following line in your MongoDB log our output:
Unclean shutdown detected.
This indicates that you need to remove the lockfile and run repair. If you run repair when the mongodb.lock file exists without the mongod --repairpath option, you will see a message that contains the following line:
old lock file: /data/db/mongod.lock. probably means unclean shutdown
You must remove the lockfile and run the repair operation before starting the database normally using the following procedure:
Warning
Recovering a member of a replica set.
Do not use this procedure to recover a member of a replica set. Instead you should either restore from a backup or resync from an intact member of the set, as described in Resyncing a Member of a Replica Set.
There are two processes to repair data files that result from an unexpected shutdown:
Use the --repair option in conjunction with the --repairpath option. mongod will read the existing data files, and write the existing data to new data files. This does not modify or alter the existing data files.
You do not need to remove the mongod.lock file before using this procedure.
Use the --repair option. mongod will read the existing data files, write the existing data to new files and replace the existing, possibly corrupt, files with new files.
You must remove the mongod.lock file before using this procedure.
Note
--repair functionality is also available in the shell with the db.repairDatabase() helper for the repairDatabase command.
To repair your data files using the --repairpath option to preserve the original data files unmodified:
Start mongod using --repair to read the existing data files.
mongod --dbpath /data/db --repair --repairpath /data/db0
When this completes, the new repaired data files will be in the /data/db0 directory.
Start mongod using the following invocation to point the dbpath at /data/db2:
mongod --dbpath /data/db0
Once you confirm that the data files are operational you may delete or archive the data files in the /data/db directory.
To repair your data files without preserving the original files, do not use the --repairpath option, as in the following procedure:
Remove the stale lock file:
rm /data/db/mongod.lock
Replace /data/db with your dbpath where your MongoDB instance’s data files reside.
Warning
After you remove the mongod.lock file you must run the --repair process before using your database.
Start mongod using --repair to read the existing data files.
mongod --dbpath /data/db --repair
When this completes, the repaired data files will replace the original data files in the /data/db directory.
Start mongod using the following invocation to point the dbpath at /data/db:
mongod --dbpath /data/db
In normal operation, you should never remove the mongod.lock file and start mongod. Instead use one of the above methods to recover the database and remove the lock files. In dire situations you can remove the lockfile, and start the database using the possibly corrupt files, and attempt to recover data from the database; however, it’s impossible to predict the state of the database in these situations.
If you are not running with journaling, and your database shuts down unexpectedly for any reason, you should always proceed as if your database is in an inconsistent and likely corrupt state. If at all possible restore from backup or if running as a replica set resync from an intact member of the set, as described in Resyncing a Member of a Replica Set.
The following describes the replica set configuration object:
The following describe MongoDB output and status related to replication:
This page lists the documents, tutorials, and reference pages that describe sharding.
For an overview, see Sharding Fundamentals. To configure, maintain, and troubleshoot sharded clusters, see Sharded Cluster Administration. For deployment architectures, see Sharded Cluster Architectures. For details on the internal operations of sharding, see Sharding Internals. For procedures for performing certain sharding tasks, see the Tutorials list.
The following is the outline of the main documentation:
This document provides an overview of the fundamental concepts and operations of sharding with MongoDB. For a list of all sharding documentation see Sharding.
MongoDB’s sharding system allows users to partition a collection within a database to distribute the collection’s documents across a number of mongod instances or shards. Sharding increases write capacity, provides the ability to support larger working sets, and raises the limits of total data size beyond the physical resources of a single node.
With sharding MongoDB automatically distributes data among a collection of mongod instances. Sharding, as implemented in MongoDB has the following features:
Sharding increases capacity in two ways:
A typical sharded cluster consists of:
While sharding is a powerful and compelling feature, it comes with significant Infrastructure Requirements and some limited complexity costs. As a result, use sharding only as necessary, and when indicated by actual operational requirements. Consider the following overview of indications it may be time to consider sharding.
You should consider deploying a sharded cluster, if:
If these attributes are not present in your system, sharding will only add additional complexity to your system without providing much benefit. When designing your data model, if you will eventually need a sharded cluster, consider which collections you will want to shard and the corresponding shard keys.
Warning
It takes time and resources to deploy sharding, and if your system has already reached or exceeded its capacity, you will have a difficult time deploying sharding without impacting your application.
As a result, if you think you will need to partition your database in the future, do not wait until your system is overcapacity to enable sharding.
A sharded cluster has the following components:
Three config servers.
These special mongod instances store the metadata for the cluster. The mongos instances cache this data and use it to determine which shard is responsible for which chunk.
For development and testing purposes you may deploy a cluster with a single configuration server process, but always use exactly three config servers for redundancy and safety in production.
Two or more shards. Each shard consists of one or more mongod instances that store the data for the shard.
These “normal” mongod instances hold all of the actual data for the cluster.
Typically each shard is a replica sets. Each replica set consists of multiple mongod instances. The members of the replica set provide redundancy and high available for the data in each shard.
Warning
MongoDB enables data partitioning, or sharding, on a per collection basis. You must access all data in a sharded cluster via the mongos instances as below. If you connect directly to a mongod in a sharded cluster you will see its fraction of the cluster’s data. The data on any given shard may be somewhat random: MongoDB provides no guarantee that any two contiguous chunks will reside on a single shard.
One or more mongos instances.
These instance direct queries from the application layer to the shards that hold the data. The mongos instances have no persistent state or data files and only cache metadata in RAM from the config servers.
Note
In most situations mongos instances use minimal resources, and you can run them on your application servers without impacting application performance. However, if you use the aggregation framework some processing may occur on the mongos instances, causing that mongos to require more system resources.
Your cluster must manage a significant quantity of data for sharding to have an effect on your collection. The default chunk size is 64 megabytes, [1] and the balancer will not begin moving data until the imbalance of chunks in the cluster exceeds the migration threshold.
Practically, this means that unless your cluster has many hundreds of megabytes of data, chunks will remain on a single shard.
While there are some exceptional situations where you may need to shard a small collection of data, most of the time the additional complexity added by sharding the small collection is not worth the additional complexity and overhead unless you need additional concurrency or capacity for some reason. If you have a small data set, usually a properly configured single MongoDB instance or replica set will be more than sufficient for your persistence layer needs.
| [1] | chunk size is user configurable. However, the default value is of 64 megabytes is ideal for most deployments. See the Chunk Size section in the Sharding Internals document for more information. |
Because all components of a sharded cluster must communicate with each other over the network, there are special restrictions regarding the use of localhost addresses:
If you use either “localhost” or “127.0.0.1” as the host identifier, then you must use “localhost” or “127.0.0.1” for all host settings for any MongoDB instances in the cluster. This applies to both the host argument to addShard and the value to the mongos --configdb run time option. If you mix localhost addresses with remote host address, MongoDB will produce errors.
“Shard keys” refer to the field that exists in every document in a collection that MongoDB uses to distribute documents among the shards. Shard keys, like indexes, can be either a single field, or may be a compound key, consisting of multiple fields.
Remember, MongoDB’s sharding is range-based: each chunk holds documents having specific range of values for the “shard key”. Thus, choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster.
Appropriate shard key choice depends on the schema of your data and the way that your application queries and writes data to the database.
The ideal shard key:
The challenge when selecting a shard key is that there is not always an obvious choice. Often, an existing field in your collection may not be the optimal key. In those situations, computing a special purpose shard key into an additional field or using a compound shard key may help produce one that is more ideal.
Config servers maintain the shard metadata in a config database. The config database stores the relationship between chunks and where they reside within a sharded cluster. Without a config database, the mongos instances would be unable to route queries or write operations within the cluster.
Config servers do not run as replica sets. Instead, a cluster operates with a group of three config servers that use a two-phase commit process that ensures immediate consistency and reliability.
For testing purposes you may deploy a cluster with a single config server, but this is not recommended for production.
Warning
If your cluster has a single config server, this mongod is a single point of failure. If the instance is inaccessible the cluster is not accessible. If you cannot recover the data on a config server, the cluster will be inoperable.
Always use three config servers for production deployments.
The actual load on configuration servers is small because each mongos instances maintains a cached copy of the configuration database. MongoDB only writes data to the config server to:
Additionally, all config servers must be available on initial setup of a sharded cluster, each mongos instance must be able to write to the config.version collection.
If one or two configuration instances become unavailable, the cluster’s metadata becomes read only. It is still possible to read and write data from the shards, but no chunk migrations or splits will occur until all three servers are accessible. At the same time, config server data is only read in the following situations:
If all three config servers are inaccessible, you can continue to use the cluster as long as you don’t restart the mongos instances until the after config servers are accessible again. If you restart the mongos instances and there are no accessible config servers, the mongos would be unable to direct queries or write operations to the cluster.
Because the configuration data is small relative to the amount of data stored in a cluster, the amount of activity is relatively low, and 100% up time is not required for a functioning sharded cluster. As a result, backing up the config servers is not difficult. Backups of config servers are critical as clusters become totally inoperable when you lose all configuration instances and data. Precautions to ensure that the config servers remain available and intact are critical.
Note
Configuration servers store metadata for a single sharded cluster. You must have a separate configuration server or servers for each cluster you administer.
The mongos provides a single unified interface to a sharded cluster for applications using MongoDB. Except for the selection of a shard key, application developers and administrators need not consider any of the internal details of sharding.
mongos caches data from the config server, and uses this to route operations from applications and clients to the mongod instances. mongos have no persistent state and consume minimal system resources.
The most common practice is to run mongos instances on the same systems as your application servers, but you can maintain mongos instances on the shards or on other dedicated resources.
Note
Changed in version 2.1.
Some aggregation operations using the aggregate command (i.e. db.collection.aggregate(),) will cause mongos instances to require more CPU resources than in previous versions. This modified performance profile may dictate alternate architecture decisions if you use the aggregation framework extensively in a sharded environment.
mongos uses information from config servers to route operations to the cluster as efficiently as possible. In general, operations in a sharded environment are either:
When possible you should design your operations to be as targeted as possible. Operations have the following targeting characteristics:
Query operations broadcast to all shards [2] unless the mongos can determine which shard or shard stores this data.
For queries that include the shard key, mongos can target the query at a specific shard or set of shards, if the portion of the shard key included in the query is a prefix of the shard key. For example, if the shard key is:
{ a: 1, b: 1, c: 1 }
The mongos can route queries that include the full shard key or either of the following shard key prefixes at a specific shard or set of shards:
{ a: 1 }
{ a: 1, b: 1 }
Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still have to contact multiple shards [3] to fulfill these queries.
All insert() operations target to one shard.
All single update() operations target to one shard. This includes upsert operations.
The mongos broadcasts multi-update operations to every shard.
The mongos broadcasts remove() operations to every shard unless the operation specifies the shard key in full.
While some operations must broadcast to all shards, you can improve performance by using as many targeted operations as possible by ensuring that your operations include the shard key.
| [2] | If a shard does not store chunks from a given collection, queries for documents in that collection are not broadcast to that shard. |
| [3] | mongos will route some queries, even some that include the shard key, to all shards, if needed. |
To route a query to a cluster, mongos uses the following process:
Determine the list of shards that must receive the query.
In some cases, when the shard key or a prefix of the shard key is a part of the query, the mongos can route the query to a subset of the shards. Otherwise, the mongos must direct the query to all shards that hold documents for that collection.
Example
Given the following shard key:
{ zipcode: 1, u_id: 1, c_date: 1 }
Depending on the distribution of chunks in the cluster, the mongos may be able to target the query at a subset of shards, if the query contains the following fields:
{ zipcode: 1 }
{ zipcode: 1, u_id: 1 }
{ zipcode: 1, u_id: 1, c_date: 1 }
Establish a cursor on all targeted shards.
When the first batch of results returns from the cursors:
For query with sorted results (i.e. using cursor.sort()) the mongos performs a merge sort of all queries.
For a query with unsorted results, the mongos returns a result cursor that “round robins” results from all cursors on the shards.
Changed in version 2.0.5: Before 2.0.5, the mongos exhausted each cursor, one by one.
Balancing is the process MongoDB uses to redistribute data within a sharded cluster. When a shard has a too many: term:chunks <chunk> when compared to other shards, MongoDB balances the shards.
The balancing process attempts to minimize the impact that balancing can have on the cluster, by:
You may disable the balancer on a temporary basis for maintenance and limit the window during which it runs to prevent the balancing process from impacting production traffic.
See also
Note
The balancing procedure for sharded clusters is entirely transparent to the user and application layer. This documentation is only included for your edification and possible troubleshooting purposes.
Note
You should always run all mongod components in trusted networking environments that control access to the cluster using network rules and restrictions to ensure that only known traffic reaches your mongod and mongos instances.
Warning
Limitations
Changed in version 2.2: Read only authentication is fully supported in shard clusters. Previously, in version 2.0, sharded clusters would not enforce read-only limitations.
Changed in version 2.0: Sharded clusters support authentication. Previously, in version 1.8, sharded clusters will not support authentication and access control. You must run your sharded systems in trusted environments.
To control access to a sharded cluster, you must set the keyFile option on all components of the sharded cluster. Use the --keyFile run-time option or the keyFile configuration option for all mongos, configuration instances, and shard mongod instances.
There are two classes of security credentials in a sharded cluster: credentials for “admin” users (i.e. for the admin database) and credentials for all other databases. These credentials reside in different locations within the cluster and have different roles:
This means that you can authenticate to these users and databases while connected directly to the primary shard for a database. However, for clarity and consistency all interactions between the client and the database should use a mongos instance.
Note
Individual shards can store administrative credentials to their instance, which only permit access to a single shard. MongoDB stores these credentials in the shards’ admin databases and these credentials are completely distinct from the cluster-wide administrative credentials.
This document describes common administrative tasks for sharded clusters. For complete documentation of sharded clusters see the Sharding section of this manual.
Sharding Procedures:
Before deploying a cluster, see Sharding Requirements.
For testing purposes, you can run all the required shard mongod processes on a single server. For production, use the configurations described in Replication Architectures.
Warning
Sharding and “localhost” Addresses
If you use either “localhost” or 127.0.0.1 as the hostname portion of any host identifier, for example as the host argument to addShard or the value to the --configdb run time option, then you must use “localhost” or 127.0.0.1 for all host settings for any MongoDB instances in the cluster. If you mix localhost addresses and remote host address, MongoDB will error.
If you have an existing replica set, you can use the Convert a Replica Set to a Replicated Sharded Cluster tutorial as a guide. If you’re deploying a cluster from scratch, see the Deploy a Sharded Cluster tutorial for more detail or use the following procedure as a quick starting point:
Create data directories for each of the three (3) config server instances.
Start the three config server instances. For example, to start a config server instance running on TCP port 27018 with the data stored in /data/configdb, type the following:
mongod --configsvr --dbpath /data/configdb --port 27018
For additional command options, see mongod and Configuration File Options.
Note
All config servers must be running and available when you first initiate a sharded cluster.
Start a mongos instance. For example, to start a mongos that connects to config server instance running on the following hosts:
You would issue the following command:
mongos --configdb mongoc0.example.net:27018,mongoc1.example.net:27018,mongoc2.example.net:27018
Connect to one of the mongos instances. For example, if a mongos is accessible at mongos0.example.net on port 27017, issue the following command:
mongo mongos0.example.net
Add shards to the cluster.
Note
In production deployments, all shards should be replica sets.
To deploy a replica set, see the Deploy a Replica Set tutorial.
From the mongo shell connected to the mongos instance, call the sh.addShard() method for each shard to add to the cluster.
For example:
sh.addShard( "mongodb0.example.net:27027" )
If mongodb0.example.net:27027 is a member of a replica set, call the sh.addShard() method with an argument that resembles the following:
sh.addShard( "<setName>/mongodb0.example.net:27027" )
Replace, <setName> with the name of the replica set, and MongoDB will discover all other members of the replica set. Repeat this step for each new shard in your cluster.
Optional
You can specify a name for the shard and a maximum size. See addShard.
Note
Changed in version 2.0.3.
Before version 2.0.3, you must specify the shard in the following form:
replicaSetName/<seed1>,<seed2>,<seed3>
For example, if the name of the replica set is repl0, then your sh.addShard() command would be:
sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" )
Enable sharding for each database you want to shard. While sharding operates on a per-collection basis, you must enable sharding for each database that holds collections you want to shard. This step is a meta-data change and will not redistribute your data.
MongoDB enables sharding on a per-database basis. This is only a meta-data change and will not redistribute your data. To enable sharding for a given database, use the enableSharding command or the sh.enableSharding() shell helper.
db.runCommand( { enableSharding: <database> } )
Or:
sh.enableSharding(<database>)
Note
MongoDB creates databases automatically upon their first use.
Once you enable sharding for a database, MongoDB assigns a primary shard for that database, where MongoDB stores all data before sharding begins.
Enable sharding on a per-collection basis.
Finally, you must explicitly specify collections to shard. The collections must belong to a database for which you have enabled sharding. When you shard a collection, you also choose the shard key. To shard a collection, run the shardCollection command or the sh.shardCollection() shell helper.
db.runCommand( { shardCollection: "<database>.<collection>", key: { <shard-key>: 1 } } )
Or:
sh.shardCollection("<database>.<collection>", <shard-key>)
For example:
db.runCommand( { shardCollection: "myapp.users", key: { username: 1 } } )
Or:
sh.shardCollection("myapp.users", { username: 1 })
The choice of shard key is incredibly important: it affects everything about the cluster from the efficiency of your queries to the distribution of data. Furthermore, you cannot change a collection’s shard key after setting it.
See the Shard Key Overview and the more in depth documentation of Shard Key Qualities to help you select better shard keys.
If you do not specify a shard key, MongoDB will shard the collection using the _id field.
This section outlines procedures for adding and remove shards, as well as general monitoring and maintenance of a sharded cluster.
To add a shard to an existing sharded cluster, use the following procedure:
Connect to a mongos in the cluster using the mongo shell.
First, you need to tell the cluster where to find the individual shards. You can do this using the addShard command or the sh.addShard() helper:
sh.addShard( "<hostname>:<port>" )
Replace <hostname> and <port> with the hostname and TCP port number of where the shard is accessible. Alternately specify a replica set name and at least one hostname which is a member of the replica set.
For example:
sh.addShard( "mongodb0.example.net:27027" )
Note
In production deployments, all shards should be replica sets.
Repeat for each shard in your cluster.
Optional
You may specify a “name” as an argument to the addShard command, as follows:
db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } )
You cannot specify a name for a shard using the sh.addShard() helper in the mongo shell. If you use the helper or do not specify a shard name, then MongoDB will assign a name upon creation.
Changed in version 2.0.3: Before version 2.0.3, you must specify the shard in the following form: the replica set name, followed by a forward slash, followed by a comma-separated list of seeds for the replica set. For example, if the name of the replica set is “myapp1”, then your sh.addShard() command might resemble:
sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" )
Note
It may take some time for chunks to migrate to the new shard.
For an introduction to balancing, see Balancing and Distribution. For lower level information on balancing, see Cluster Balancer.
To remove a shard from a sharded cluster, you must:
Note
To successfully migrate data from a shard, the balancer process must be active.
The procedure to remove a shard is as follows:
Connect to a mongos in the cluster using the mongo shell.
Determine the name of the shard you will be removing.
You must specify the name of the shard. You may have specified this shard name when you first ran the addShard command. If not, you can find out the name of the shard by running the listShards or printShardingStatus commands or the sh.status() shell helper.
The following examples will remove a shard named mongodb0 from the cluster.
Begin removing chunks from the shard.
Start by running the removeShard command. This will start “draining” or migrating chunks from the shard you’re removing to another shard in the cluster.
db.runCommand( { removeshard: "mongodb0" } )
This operation will return the following response immediately:
{ msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 }
Depending on your network capacity and the amount of data in the shard, this operation can take anywhere from a few minutes to several days to complete.
View progress of the migration.
You can run the removeShard command again at any stage of the process to view the progress of the migration, as follows:
db.runCommand( { removeShard: "mongodb0" } )
The output should look something like this:
{ msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 }
In the remaining sub-document { chunks: xx, dbs: y }, a counter displays the remaining number of chunks that MongoDB must migrate to other shards and the number of MongoDB databases that have “primary” status on this shard.
Continue checking the status of the removeShard command until the remaining number of chunks to transfer is 0.
Move any databases to other shards in the cluster as needed.
This is only necessary when removing a shard that is also the primary shard for one or more databases.
Issue the following command at the mongo shell:
db.runCommand( { movePrimary: "myapp", to: "mongodb1" })
This command will migrate all remaining non-sharded data in the database named myapp to the shard named mongodb1.
Warning
Do not run the movePrimary command until you have finished draining the shard.
The command will not return until MongoDB completes moving all data. The response from this command will resemble the following:
{ "primary" : "mongodb1", "ok" : 1 }
Run removeShard again to clean up all metadata information and finalize the shard removal, as follows:
db.runCommand( { removeshard: "mongodb0" } )
When successful, this command will return a document like this:
{ msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 }
Once the value of the stage field is “completed,” you may safely stop the processes comprising the mongodb0 shard.
To list the databases that have sharding enabled, query the databases collection in the Config Database Contents. A database has sharding enabled if the value of the partitioned field is true. Connect to a mongos instance with a mongo shell, and run the following operation to get a full list of databases with sharding enabled:
use config
db.databases.find( { "partitioned": true } )
Example
You can use the following sequence of commands when to return a list of all databases in the cluster:
use config
db.databases.find()
If this returns the following result set:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" }
{ "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" }
Then sharding is only enabled for the animals database.
To list the current set of configured shards, use the listShards command, as follows:
use admin
db.runCommand( { listShards : 1 } )
To view cluster details, issue db.printShardingStatus() or sh.status(). Both methods return the same output.
Example
In the following example output from sh.status()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "m0.example.net:30001" }
{ "_id" : "shard0001", "host" : "m3.example2.net:50000" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "animals", "partitioned" : true, "primary" : "shard0000" }
foo.big chunks:
shard0001 1
shard0000 6
{ "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo
{ "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo
{ "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo
{ "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo
{ "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo
{ "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4)
{ "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5)
foo.large chunks:
shard0001 1
shard0000 5
{ "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0)
{ "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo
{ "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo
{ "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo
{ "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4)
{ "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5)
{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
This section describes various operations on chunks in sharded clusters. MongoDB automates most chunk management operations. However, these chunk management operations are accessible to administrators for use in some situations, typically surrounding initial setup, deployment, and data ingestion.
Normally, MongoDB splits a chunk following inserts when a chunk exceeds the chunk size. The balancer may migrate recently split chunks to a new shard immediately if mongos predicts future insertions will benefit from the move.
MongoDB treats all chunks the same, whether split manually or automatically by the system.
Warning
You cannot merge or combine chunks once you have split them.
You may want to split chunks manually if:
Example
You plan to insert a large amount of data with shard key values between 300 and 400, but all values of your shard keys are between 250 and 500 are in a single chunk.
Use sh.status() to determine the current chunks ranges across the cluster.
To split chunks manually, use the split command with operators: middle and find. The equivalent shell helpers are sh.splitAt() or sh.splitFind().
Example
The following command will split the chunk that contains the value of 63109 for the zipcode field in the people collection of the records database:
sh.splitFind( "records.people", { "zipcode": 63109 } )
sh.splitFind() will split the chunk that contains the first document returned that matches this query into two equally sized chunks. You must specify the full namespace (i.e. “<database>.<collection>”) of the sharded collection to sh.splitFind(). The query in sh.splitFind() need not contain the shard key, though it almost always makes sense to query for the shard key in this case, and including the shard key will expedite the operation.
Use sh.splitAt() to split a chunk in two using the queried document as the partition point:
sh.splitAt( "records.people", { "zipcode": 63109 } )
However, the location of the document that this query finds with respect to the other documents in the chunk does not affect how the chunk splits.
In most situations a sharded cluster will create and distribute chunks automatically without user intervention. However, in a limited number of use profiles, MongoDB cannot create enough chunks or distribute data fast enough to support required throughput. Consider the following scenarios:
you must partition an existing data collection that resides on a single shard.
you must ingest a large volume of data into a cluster that isn’t balanced, or where the ingestion of data will lead to an imbalance of data.
This can arise in an initial data loading, or in a case where you must insert a large volume of data into a single chunk, as is the case when you must insert at the beginning or end of the chunk range, as is the case for monotonically increasing or decreasing shard keys.
Preemptively splitting chunks increases cluster throughput for these operations, by reducing the overhead of migrating chunks that hold data during the write operation. MongoDB only creates splits after an insert operation, and can only migrate a single chunk at a time. Chunk migrations are resource intensive and further complicated by large write volume to the migrating chunk.
To create and migrate chunks manually, use the following procedure:
Split empty chunks in your collection by manually performing split command on chunks.
Example
To create chunks for documents in the myapp.users collection, using the email field as the shard key, use the following operation in the mongo shell:
for ( var x=97; x<97+26; x++ ){ for( var y=97; y<97+26; y+=6 ) { var prefix = String.fromCharCode(x) + String.fromCharCode(y); db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); } }
This assumes a collection size of 100 million documents.
Migrate chunks manually using the moveChunk command:
Example
To migrate all of the manually created user profiles evenly, putting each prefix chunk on the next shard from the other, run the following commands in the mongo shell:
var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; for ( var x=97; x<97+26; x++ ){ for( var y=97; y<97+26; y+=6 ) { var prefix = String.fromCharCode(x) + String.fromCharCode(y); db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) } }
You can also let the balancer automatically distribute the new chunks. For an introduction to balancing, see Balancing and Distribution. For lower level information on balancing, see Cluster Balancer.
When you initialize a sharded cluster, the default chunk size is 64 megabytes. This default chunk size works well for most deployments. However, if you notice that automatic migrations are incurring a level of I/O that your hardware cannot handle, you may want to reduce the chunk size. For the automatic splits and migrations, a small chunk size leads to more rapid and frequent migrations.
To modify the chunk size, use the following procedure:
Connect to any mongos in the cluster using the mongo shell.
Issue the following command to switch to the Config Database Contents:
use config
Issue the following save() operation:
db.settings.save( { _id:"chunksize", value: <size> } )
Where the value of <size> reflects the new chunk size in megabytes. Here, you’re essentially writing a document whose values store the global chunk size configuration value.
Note
The chunkSize and --chunkSize options, passed at runtime to the mongos do not affect the chunk size after you have initialized the cluster.
To eliminate confusion you should always set chunk size using the above procedure and never use the runtime options.
Modifying the chunk size has several limitations:
If you increase the chunk size, existing chunks must grow through insertion or updates until they reach the new size.
In most circumstances, you should let the automatic balancer migrate chunks between shards. However, you may want to migrate chunks manually in a few cases:
For more information on how chunks move between shards, see Cluster Balancer, in particular the section Chunk Migration.
To migrate chunks, use the moveChunk command.
Note
To return a list of shards, use the listShards command.
Specify shard names using the addShard command using the name argument. If you do not specify a name in the addShard command, MongoDB will assign a name automatically.
The following example assumes that the field username is the shard key for a collection named users in the myapp database, and that the value smith exists within the chunk you want to migrate.
To move this chunk, you would issue the following command from a mongo shell connected to any mongos instance.
db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example.net"})
This command moves the chunk that includes the shard key value “smith” to the shard named mongodb-shard3.example.net. The command will block until the migration is complete.
See Create Chunks (Pre-Splitting) for an introduction to pre-splitting.
New in version 2.2: moveChunk command has the: _secondaryThrottle parameter. When set to true, MongoDB ensures that secondary members have replicated operations before allowing new chunk migrations.
Warning
The moveChunk command may produce the following error message:
The collection's metadata lock is already taken.
These errors occur when clients have too many open cursors that access the chunk you are migrating. You can either wait until the cursors complete their operation or close the cursors manually.
Large bulk insert operations including initial data ingestion or routine data import, can have a significant impact on a sharded cluster. Consider the following strategies and possibilities for bulk insert operations:
If the collection does not have data, then there is only one chunk, which must reside on a single shard. MongoDB must receive data, create splits, and distribute chunks to the available shards. To avoid this performance cost, you can pre-split the collection, as described in Create Chunks (Pre-Splitting).
You can parallels import by sending insert operations to more than one mongos instance. If the collection is empty, pre-split first, as described in Create Chunks (Pre-Splitting).
If your shard key increases monotonically during an insert then all the inserts will go to the last chunk in the collection, which will always end up on a single shard. Therefore, the insert capacity of the cluster will never exceed the insert capacity of a single shard.
If your insert volume is never larger than what a single shard can process, then there is no problem; however, if the insert volume exceeds that range, and you cannot avoid a monotonically increasing shard key, then consider the following modifications to your application:
Example
The following example, in C++, swaps the leading and trailing 16-bit word of BSON ObjectIds generated so that they are no longer monotonically increasing.
using namespace mongo;
OID make_an_id() {
OID x = OID::gen();
const unsigned char *p = x.getData();
swap( (unsigned short&) p[0], (unsigned short&) p[10] );
return x;
}
void foo() {
// create an object
BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
// now we might insert o into a sharded collection...
}
For information on choosing a shard key, see Shard Keys and see Shard Key Internals (in particular, Operations and Reliability and Choosing a Shard Key).
This section describes provides common administrative procedures related to balancing. For an introduction to balancing, see Balancing and Distribution. For lower level information on balancing, see Cluster Balancer.
To see if the balancer process is active in your cluster, do the following:
Connect to any mongos in the cluster using the mongo shell.
Issue the following command to switch to the Config Database Contents:
use config
Use the following query to return the balancer lock:
db.locks.find( { _id : "balancer" } ).pretty()
When this command returns, you will see output like the following:
{ "_id" : "balancer",
"process" : "mongos0.example.net:1292810611:1804289383",
"state" : 2,
"ts" : ObjectId("4d0f872630c42d1978be8a2e"),
"when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)",
"who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886",
"why" : "doing balance round" }
This output confirms that:
Optional
You can also use the following shell helper, which returns a boolean to report if the balancer is active:
sh.getBalancerState()
In some situations, particularly when your data set grows slowly and a migration can impact performance, it’s useful to be able to ensure that the balancer is active only at certain times. Use the following procedure to specify a window during which the balancer will be able to migrate chunks:
Connect to any mongos in the cluster using the mongo shell.
Issue the following command to switch to the Config Database Contents:
use config
Use an operation modeled on the following example update() operation to modify the balancer’s window:
db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "<start-time>", stop : "<stop-time>" } } }, true )
Replace <start-time> and <end-time> with time values using two digit hour and minute values (e.g HH:MM) that describe the beginning and end boundaries of the balancing window. These times will be evaluated relative to the time zone of each individual mongos instance in the sharded cluster. For instance, running the following will force the balancer to run between 11PM and 6AM local time only:
db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true )
Note
The balancer window must be sufficient to complete the migration of all data inserted during the day.
As data insert rates can change based on activity and usage patterns, it is important to ensure that the balancing window you select will be sufficient to support the needs of your deployment.
If you have set the balancing window and wish to remove the schedule so that the balancer is always running, issue the following sequence of operations:
use config
db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true })
By default the balancer may run at any time and only moves chunks as needed. To disable the balancer for a short period of time and prevent all migration, use the following procedure:
Connect to any mongos in the cluster using the mongo shell.
Issue one of the following operations to disable the balancer:
sh.stopBalancer()
Later, issue one the following operations to enable the balancer:
sh.startBalancer()
Note
If a migration is in progress progress, the system will complete the in progress migration. After disabling, you can use the following operation in the mongo shell to determine if there are no migrations in progress:
use config
while( db.locks.findOne({_id: "balancer"}).state ) {
print("waiting..."); sleep(1000);
}
The above process and the sh.setBalancerState(), sh.startBalancer(), and sh.stopBalancer() helpers provide wrappers on the following process, which may be useful if you need to run this operation from a driver that does not have helper functions:
Connect to any mongos in the cluster using the mongo shell.
Issue the following command to switch to the Config Database Contents:
use config
Issue the following update to disable the balancer:
db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true );
To enable the balancer again, alter the value of “stopped” as follows:
db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true );
Config servers store all cluster metadata, most importantly, the mapping from chunks to shards. This section provides an overview of the basic procedures to migrate, replace, and maintain these servers.
See also
For redundancy, all production sharded clusters should deploy three config servers processes on three different machines.
Do not use only a single config server for production deployments. Only use a single config server deployments for testing. You should upgrade to three config servers immediately if you are shifting to production. The following process shows how to convert a test deployment with only one config server to production deployment with three config servers.
Shut down all existing MongoDB processes. This includes:
Copy the entire dbpath file system tree from the existing config server to the two machines that will provide the additional config servers. These commands, issued on the system with the existing Config Database Contents, mongo-config0.example.net may resemble the following:
rsync -az /data/configdb mongo-config1.example.net:/data/configdb
rsync -az /data/configdb mongo-config2.example.net:/data/configdb
Start all three config servers, using the same invocation that you used for the single config server.
mongod --configsvr
Restart all shard mongod and mongos processes.
Use this process when you need to migrate a config server to a new system but the new system will be accessible using the same host name.
Shut down the config server that you’re moving.
This will render all config data for your cluster read only.
Change the DNS entry that points to the system that provided the old config server, so that the same hostname points to the new system.
How you do this depends on how you organize your DNS and hostname resolution services.
Move the entire dbpath file system tree from the old config server to the new config server. This command, issued on the old config server system, may resemble the following:
rsync -az /data/configdb mongo-config0.example.net:/data/configdb
Start the config instance on the new system. The default invocation is:
mongod --configsvr
When you start the third config server, your cluster will become writable and it will be able to create new splits and migrate chunks as needed.
Use this process when you need to migrate a Config Database Contents to a new server and it will not be accessible via the same hostname. If possible, avoid changing the hostname so that you can use the previous procedure.
Shut down the config server you’re moving.
This will render all config data for your cluster “read only:”
rsync -az /data/configdb mongodb.config2.example.net:/data/configdb
Start the config instance on the new system. The default invocation is:
mongod --configsvr
Shut down all existing MongoDB processes. This includes:
Restart all mongod processes that provide the shard servers.
Update the --configdb parameter (or configdb) for all mongos instances and restart all mongos instances.
Use this procedure only if you need to replace one of your config servers after it becomes inoperable (e.g. hardware failure.) This process assumes that the hostname of the instance will not change. If you must change the hostname of the instance, use the process for migrating a config server to a different hostname.
Provision a new system, with the same hostname as the previous host.
You will have to ensure that the new system has the same IP address and hostname as the system it’s replacing or you will need to modify the DNS records and wait for them to propagate.
Shut down one (and only one) of the existing config servers. Copy all this host’s dbpath file system tree from the current system to the system that will provide the new config server. This command, issued on the system with the data files, may resemble the following:
rsync -az /data/configdb mongodb.config2.example.net:/data/configdb
Restart the config server process that you used in the previous step to copy the data files to the new config server instance.
Start the new config server instance. The default invocation is:
mongod --configsvr
The cluster will remain operational [1] without one of the config database’s mongod instances, creating a backup of the cluster metadata from the config database is straight forward:
See also
| [1] | While one of the three config servers unavailable, no the cluster cannot split any chunks nor can it migrate chunks between shards. Your application will be able to write data to the cluster. The Config Servers section of the documentation provides more information on this topic. |
The two most important factors in maintaining a successful sharded cluster are:
You can prevent most issues encountered with sharding by ensuring that you choose the best possible shard key for your deployment and ensure that you are always adding additional capacity to your cluster well before the current resources become saturated. Continue reading for specific issues you may encounter in a production environment.
Your cluster must have sufficient data for sharding to make sense. Sharding works by migrating chunks between the shards until each shard has roughly the same number of chunks.
The default chunk size is 64 megabytes. MongoDB will not begin migrations until the imbalance of chunks in the cluster exceeds the migration threshold. While the default chunk size is configurable with the chunkSize setting, these behaviors help prevent unnecessary chunk migrations, which can degrade the performance of your cluster as a whole.
If you have just deployed a sharded cluster, make sure that you have enough data to make sharding effective. If you do not have sufficient data to create more than eight 64 megabyte chunks, then all data will remain on one shard. Either lower the chunk size setting, or add more data to the cluster.
As a related problem, the system will split chunks only on inserts or updates, which means that if you configure sharding and do not continue to issue insert and update operations, the database will not create any chunks. You can either wait until your application inserts data or split chunks manually.
Finally, if your shard key has a low cardinality, MongoDB may not be able to create sufficient splits among the data.
In some situations, a single shard or a subset of the cluster will receive a disproportionate portion of the traffic and workload. In almost all cases this is the result of a shard key that does not effectively allow write scaling.
It’s also possible that you have “hot chunks.” In this case, you may be able to solve the problem by splitting and then migrating parts of these chunks.
In the worst case, you may have to consider re-sharding your data and choosing a different shard key to correct this pattern.
If you have just deployed your sharded cluster, you may want to consider the troubleshooting suggestions for a new cluster where data remains on a single shard.
If the cluster was initially balanced, but later developed an uneven distribution of data, consider the following possible causes:
If migrations impact your cluster or application’s performance, consider the following options, depending on the nature of the impact:
It’s also possible, that your shard key causes your application to direct all writes to a single shard. This kind of activity pattern can require the balancer to migrate most data soon after writing it. Consider redeploying your cluster with a shard key that provides better write scaling.
If MongoDB migrates a chunk during a backup, you can end with an inconsistent snapshot of your sharded cluster. Never run a backup while the balancer is active. To ensure that the balancer is inactive during your backup operation:
Confirm that the balancer is not active using the sh.getBalancerState() method before starting a backup operation. When the backup procedure is complete you can reactivate the balancer process.
This document describes the organization and design of sharded cluster deployments. For documentation of common administrative tasks related to sharded clusters, see Sharded Cluster Administration. For complete documentation of sharded clusters see the Sharding section of this manual.
See also
Warning
Use this architecture for testing and development only.
You can deploy a very minimal cluster for testing and development. These non-production clusters have the following components:
When deploying a production cluster, you must ensure that the data is redundant and that your systems are highly available. To that end, a production-level cluster must have the following components:
3 config servers, each residing on a discrete system.
Note
A single sharded cluster must have exclusive use of its config servers. If you have multiple shards, you will need to have a group of config servers for each cluster.
2 or more replica sets, for the shards.
See
For more information on replica sets see Replication Architectures and Replication.
mongos instances. Typically, you will deploy a single mongos instance on each application server. Alternatively, you may deploy several mongos nodes and let your application connect to these via a load balancer.
See also
Sharding operates on the collection level. You can shard multiple collections within a database, or have multiple databases with sharding enabled. [1] However, in production deployments some databases and collections will use sharding, while other databases and collections will only reside on a single database instance or replica set (i.e. a shard.)
Note
Regardless of the data architecture of your sharded cluster, ensure that all queries and operations use the mongos router to access the data cluster. Use the mongos even for operations that do not impact the sharded data.
Every database has a “primary” [2] shard that holds all un-sharded collections in that database. All collections that are not sharded reside on the primary for their database. Use the movePrimary command to change the primary shard for a database. Use the printShardingStatus command or the sh.status() to see an overview of the cluster, which contains information about the chunk and database distribution within the cluster.
Warning
The movePrimary command can be expensive because it copies all non-sharded data to the new shard, during which that data will be unavailable for other operations.
When you deploy a new sharded cluster, the “first shard” becomes the primary for all databases before enabling sharding. Databases created subsequently, may reside on any shard in the cluster.
| [1] | As you configure sharding, you will use the enableSharding command to enable sharding for a database. This simply makes it possible to use the shardCollection command on a collection within that database. |
| [2] | The term “primary” in the context of databases and sharding, has nothing to do with the term primary in the context of replica sets. |
A production cluster has no single point of failure. This section introduces the availability concerns for MongoDB deployments, and highlights potential failure scenarios and available resolutions:
Application servers or mongos instances become unavailable.
If each application server has its own mongos instance, other application servers can continue access the database. Furthermore, mongos instances do not maintain persistent state, and they can restart and become unavailable without loosing any state or data. When a mongos instance starts, it retrieves a copy of the config database and can begin routing queries.
A single mongod becomes unavailable in a shard.
Replica sets provide high availability for shards. If the unavailable mongod is a primary, then the replica set will elect a new primary. If the unavailable mongod is a secondary, and it connects within its recovery window. In a three member replica set, even if a single member of the set experiences catastrophic failure, two other members have full copies of the data.
Always investigate availability interruptions and failures. If a system is unrecoverable, replace it and create a new member of the replica set as soon as possible to replace the lost redundancy.
All members of a replica set become unavailable.
If all members of a replica set within a shard are unavailable, all data held in on that shard is unavailable. However, the data on all other shards will remain available, and it’s possible to read and write data to the other shards. However, your application must be able to deal with partial results, and you should investigate the cause of the interruption and attempt to recover the shard as soon as possible.
One or two config database become unavailable.
Three distinct mongod instances provide the config database using a special two-phase commits to maintain consistent state between these mongod instances. Cluster operation will continue as normal but chunk migration and the cluster can create no new chunk splits. Replace the config server as soon as possible. If all multiple config databases become unavailable, the cluster can become inoperable.
Note
All config servers must be running and available when you first initiate a sharded cluster.
This document introduces lower level sharding concepts for users who are familiar with sharding generally and want to learn more about the internals. This document provides a more detailed understanding of your cluster’s behavior. For higher level sharding concepts, see Sharding Fundamentals. For complete documentation of sharded clusters see the Sharding section of this manual.
Shard keys are the field in a collection that MongoDB uses to distribute documents within a sharded cluster. See the overview of shard keys for an introduction to these topics.
Cardinality in the context of MongoDB, refers to the ability of the system to partition data into chunks. For example, consider a collection of data such as an “address book” that stores address records:
Consider the use of a state field as a shard key:
The state key’s value holds the US state for a given address document. This field has a low cardinality as all documents that have the same value in the state field must reside on the same shard, even if a particular state’s chunk exceeds the maximum chunk size.
Since there are a limited number of possible values for the state field, MongoDB may distribute data unevenly among a small number of fixed chunks. This may have a number of effects:
Consider the use of a zipcode field as a shard key:
While this field has a large number of possible values, and thus has potentially higher cardinality, it’s possible that a large number of users could have the same value for the shard key, which would make this chunk of users un-splitable.
In these cases, cardinality depends on the data. If your address book stores records for a geographically distributed contact list (e.g. “Dry cleaning businesses in America,”) then a value like zipcode would be sufficient. However, if your address book is more geographically concentrated (e.g “ice cream stores in Boston Massachusetts,”) then you may have a much lower cardinality.
Consider the use of a phone-number field as a shard key:
Phone number has a high cardinality, because users will generally have a unique value for this field, MongoDB will be able to split as many chunks as needed.
While “high cardinality,” is necessary for ensuring an even distribution of data, having a high cardinality does not guarantee sufficient query isolation or appropriate write scaling. Please continue reading for more information on these topics.
Some possible shard keys will allow your application to take advantage of the increased write capacity that the cluster can provide, while others do not. Consider the following example where you shard by the values of the default _id field, which is ObjectID.
ObjectID is computed upon document creation, that is a unique identifier for the object. However, the most significant bits of data in this value represent a time stamp, which means that they increment in a regular and predictable pattern. Even though this value has high cardinality, when using this, any date, or other monotonically increasing number as the shard key, all insert operations will be storing data into a single chunk, and therefore, a single shard. As a result, the write capacity of this shard will define the effective write capacity of the cluster.
A shard key that increases monotonically will not hinder performance if you have a very low insert rate, or if most of your write operations are update() operations distributed through your entire data set. Generally, choose shard keys that have both high cardinality and will distribute write operations across the entire cluster.
Typically, a computed shard key that has some amount of “randomness,” such as ones that include a cryptographic hash (i.e. MD5 or SHA1) of other content in the document, will allow the cluster to scale write operations. However, random shard keys do not typically provide query isolation, which is another important characteristic of shard keys.
The mongos provides an interface for applications to interact with sharded clusters that hides the complexity of data partitioning. A mongos receives queries from applications, and uses metadata from the config server, to route queries to the mongod instances with the appropriate data. While the mongos succeeds in making all querying operational in sharded environments, the shard key you select can have a profound affect on query performance.
See also
The mongos and Sharding and config server sections for a more general overview of querying in sharded environments.
The fastest queries in a sharded environment are those that mongos will route to a single shard, using the shard key and the cluster meta data from the config server. For queries that don’t include the shard key, mongos must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations.
If your query includes the first component of a compound shard key [1], the mongos can route the query directly to a single shard, or a small number of shards, which provides better performance. Even if you query values of the shard key reside in different chunks, the mongos will route queries directly to specific shards.
To select a shard key for a collection:
If this field has low cardinality (i.e not sufficiently selective) you should add a second field to the shard key making a compound shard key. The data may become more splitable with a compound shard key.
See
mongos and Querying for more information on query operations in the context of sharded clusters. Specifically the Routing sub-section outlines the procedure that mongos uses to route read operations to the shards.
| [1] | In many ways, you can think of the shard key a cluster-wide unique index. However, be aware that sharded systems cannot enforce cluster-wide unique indexes unless the unique field is in the shard key. Consider the Indexes wiki page for more information on indexes and compound indexes. |
In sharded systems, the mongos performs a merge-sort of all sorted query results from the shards. See the sharded query routing and Use Indexes to Sort Query Results sections for more information.
The most important consideration when choosing a shard key are:
Furthermore:
In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query operations to a single shard.
It is unlikely that any single, naturally occurring key in your collection will satisfy all requirements of a good shard key. There are three options:
From a decision making stand point, begin by finding the field that will provide the required query isolation, ensure that writes will scale across the cluster, and then add an additional field to provide additional cardinality if your primary key does not have sufficient split-ability.
All sharded collections must have an index that starts with the shard key. If you shard a collection that does not yet contain documents and without such an index, the shardCollection command will create an index on the shard key. If the collection already contains documents, you must create an appropriate index before using shardCollection.
Changed in version 2.2: The index on the shard key no longer needs to be identical to the shard key. This index can be an index of the shard key itself as before, or a compound index where the shard key is the prefix of the index. This index cannot be a multikey index.
If you have a collection named people, sharded using the field { zipcode: 1 }, and you want to replace this with an index on the field { zipcode: 1, username: 1 }, then:
Create an index on { zipcode: 1, username: 1 }:
db.people.ensureIndex( { zipcode: 1, username: 1 } );
When MongoDB finishes building the index, you can safely drop existing index on { zipcode: 1 }:
db.people.dropIndex( { zipcode: 1 } );
Warning
The index on the shard key cannot be a multikey index.
As above, an index on { zipcode: 1, username: 1 } can only replace an index on zipcode if there are no array values for the username field.
If you drop the last appropriate index for the shard key, recover by recreating a index on just the shard key.
The balancer sub-process is responsible for redistributing chunks evenly among the shards and ensuring that each member of the cluster is responsible for the same volume of data. This section contains complete documentation of the balancer process and operations. For a higher level introduction see the Balancing and Distribution section.
A balancing round originates from an arbitrary mongos instance from one of the cluster’s mongos instances. When a balancer process is active, the responsible mongos acquires a “lock” by modifying a document in the lock collection in the Config Database Contents.
By default, the balancer process is always running. When the number of chunks in a collection is unevenly distributed among the shards, the balancer begins migrating chunks from shards with more chunks to shards with a fewer number of chunks. The balancer will continue migrating chunks, one at a time, until the data is evenly distributed among the shards.
While these automatic chunk migrations are crucial for distributing data, they carry some overhead in terms of bandwidth and workload, both of which can impact database performance. As a result, MongoDB attempts to minimize the effect of balancing by only migrating chunks when the distribution of chunks passes the migration thresholds.
The migration process ensures consistency and maximizes availability of chunks during balancing: when MongoDB begins migrating a chunk, the database begins copying the data to the new server and tracks incoming write operations. After migrating chunks, the “from” mongod sends all new writes to the “receiving” server. Finally, mongos updates the chunk record in the config database to reflect the new location of the chunk.
Note
Changed in version 2.0: Before MongoDB version 2.0, large differences in timekeeping (i.e. clock skew) between mongos instances could lead to failed distributed locks, which carries the possibility of data loss, particularly with skews larger than 5 minutes. Always use the network time protocol (NTP) by running ntpd on your servers to minimize clock skew.
Changed in version 2.2: The following thresholds appear first in 2.2; prior to this release, balancing would only commence if the shard with the most chunks had 8 more chunks than the shard with the least number of chunks.
In order to minimize the impact of balancing on the cluster, the balancer will not begin balancing until the distribution of chunks has reached certain thresholds. These thresholds apply to the difference in number of chunks between the shard with the greatest number of chunks and the shard with the least number of chunks. The balancer has the following thresholds:
| Number of Chunks | Migration Threshold |
| Less than 20 | 2 |
| 21-80 | 4 |
| Greater than 80 | 8 |
Once a balancing round starts, the balancer will not stop until the difference between the number of chunks on any two shards is less than two.
Note
You can restrict the balancer so that it only operates between specific start and end times. See Schedule the Balancing Window for more information.
The specification of the balancing window is relative to the local time zone of all individual mongos instances in the sharded cluster.
The default chunk size in MongoDB is 64 megabytes.
When chunks grow beyond the specified chunk size a mongos instance will split the chunk in half. This will eventually lead to migrations, when chunks become unevenly distributed among the cluster. The mongos instances will initiate a round of migrations to redistribute data in the cluster.
Chunk size is arbitrary and must account for the following:
For many deployments it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set, but this value is configurable. Be aware of the following limitations when modifying chunk size:
By default, MongoDB will attempt to fill all available disk space with data on every shard as the data set grows. Monitor disk utilization in addition to other performance metrics, to ensure that the cluster always has capacity to accommodate additional data.
You can also configure a “maximum size” for any shard when you add the shard using the maxSize parameter of the addShard command. This will prevent the balancer from migrating chunks to the shard when the value of mem.mapped exceeds the maxSize setting.
See also
MongoDB migrates chunks in a sharded cluster to distribute data evenly among shards. Migrations may be either:
All chunk migrations use the following procedure:
The balancer process sends the moveChunk command to the source shard for the chunk. In this operation the balancer passes the name of the destination shard to the source shard.
The source initiates the move with an internal moveChunk command with the destination shard.
The destination shard begins requesting documents in the chunk, and begins receiving these chunks.
After receiving the final document in the chunk, the destination shard initiates a synchronization process to ensure that all changes to the documents in the chunk on the source shard during the migration process exist on the destination shard.
When fully synchronized, the destination shard connects to the config database and updates the chunk location in the cluster metadata. After completing this operation, once there are no open cursors on the chunk, the source shard starts deleting its copy of documents from the migrated chunk.
When the _secondaryThrottle is true for moveChunk or the balancer, MongoDB ensure that one secondary member has replicated changes before allowing new chunk migrations.
If your application must detect if the MongoDB instance its connected to is mongos, use the isMaster command. When a client connects to a mongos, isMaster returns a document with a msg field that holds the string isdbgrid. For example:
{
"ismaster" : true,
"msg" : "isdbgrid",
"maxBsonObjectSize" : 16777216,
"ok" : 1
}
If the application is instead connected to a mongod, the returned document does not include the isdbgrid string.
The config database contains information about your sharding configuration and stores the information in a set of collections used by sharding.
Important
Back up the config database before performing any maintenance on the config server.
To access the config database, issue the following command from the mongo shell:
use config
In general, you should never manipulate the content of the config database directly. The config database contains the following collections:
See Config Database Contents for full documentation of these collections and their role in sharded clusters.
When sharding a GridFS store, consider the following:
Most deployments will not need to shard the files collection. The files collection is typically small, and only contains metadata. None of the required keys for GridFS lend themselves to an even distribution in a sharded situation. If you must shard the files collection, use the _id field possibly in combination with an application field
Leaving files unsharded means that all the file metadata documents live on one shard. For production GridFS stores you must store the files collection on a replica set.
To shard the chunks collection by { files_id : 1 , n : 1 }, issue commands similar to the following:
db.fs.chunks.ensureIndex( { files_id : 1 , n : 1 } )
db.runCommand( { shardCollection : "test.fs.chunks" , key : { files_id : 1 , n : 1 } } )
You may also want shard using just the file_id field, as in the following operation:
db.runCommand( { shardCollection : "test.fs.chunks" , key : { files_id : 1 } } )
Note
Changed in version 2.2.
Before 2.2, you had to create an additional index on files_id to shard using only this field.
The default files_id value is an ObjectId, as a result the values of files_id are always ascending, and applications will insert all new GridFS data to a single chunk and shard. If your write load is too high for a single server to handle, consider a different shard key or use a different value for different value for _id in the files collection.
The following tutorials describe specific sharding procedures:
This document describes how to deploy a sharded cluster for a standalone mongod instance. To deploy a cluster for an existing replica set, see Convert a Replica Set to a Replicated Sharded Cluster.
Before deploying a sharded cluster, see the requirements listed in Requirements for Sharded Clusters.
Warning
Sharding and “localhost” Addresses
If you use either “localhost” or 127.0.0.1 as the hostname portion of any host identifier, for example as the host argument to addShard or the value to the --configdb run time option, then you must use “localhost” or 127.0.0.1 for all host settings for any MongoDB instances in the cluster. If you mix localhost addresses and remote host address, MongoDB will error.
The config server database processes are small mongod instances that store the cluster’s metadata. You must have exactly three instances in production deployments. Each stores a complete copy of the cluster’s metadata. These instances should run on different servers to assure good uptime and data safety.
Since config database mongod instances receive relatively little traffic and demand only a small portion of system resources, you can run the instances on systems that run other cluster components.
By default a mongod --configsvr process stores its data files in the /data/configdb directory. You can specify a different location using the dbpath run-time option. The config mongod instance is accessible via port 27019. In addition to configsvr, use other mongod runtime options as needed.
To create a data directory for each config server, issue a command similar to the following for each:
mkdir /data/db/config
To start each config server, issue a command similar to the following for each:
mongod --configsvr --dbpath <path> --port <port>
The mongos instance routes queries and operations to the appropriate shards and interacts with the config server instances. All client operations targeting a cluster go through mongos instances.
mongos instances are lightweight and do not require data directories. A cluster typically has several instances. For example, you might run one mongos instance on each of your application servers, or you might run a mongos instance on each of the servers running a mongod process.
You must the specify resolvable hostnames [1] for the 3 config servers when starting the mongos instance. You specify the hostnames either in the configuration file or as command line parameters.
The mongos instance runs on the default MongoDB TCP port: 27017.
To start mongos instance running on the mongos0.example.net host, that connects to the config server instances running on the following hosts:
You would issue the following command:
mongos --configdb mongoc0.example.net,mongoc1.example.net,mongoc2.example.net
| [1] | Use DNS names for the config servers rather than explicit IP addresses for operational flexibility. If you’re not using resolvable hostname, you cannot change the config server names or IP addresses without a restarting every mongos and mongod instance. |
You must deploy at least one shard or one replica set to begin. In a production cluster, each shard is a replica set. You may add additional shards to a running cluster later. For instructions on deploying replica sets, see Deploy a Replica Set.
This procedure assumes you have two active and initiated replica sets and describes how to add the first two shards to the cluster.
First, connect to one of the mongos instances. For example, if a mongos is accessible at mongos0.example.net on port 27017, issue the following command:
mongo mongos0.example.net
Then, from a mongo shell connected to the mongos instance, call the sh.addShard() method for each shard that you want to add to the cluster:
sh.addShard( "s0/sfo30.example.net" )
sh.addShard( "s1/sfo40.example.net" )
If the host you are adding is a member of a replica set, you must specify the name of the replica set. mongos will discover the names of other members of the replica set based on the name and the hostname you provide.
These operations add two shards, provided by:
All shards should be replica sets
Changed in version 2.0.3.
After version 2.0.3, you may use the above form to add replica sets to a cluster. The cluster will automatically discover the other members of the replica set and note their names accordingly.
Before version 2.0.3, you must specify the shard in the following form: the replica set name, followed by a forward slash, followed by a comma-separated list of seeds for the replica set. For example, if the name of the replica set is sh0, and the replica set were to have three members, then your sh.addShard command might resemble:
sh.addShard( "sh0/sfo30.example.net,sfo31.example.net,sfo32.example.net" )
The sh.addShard() helper in the mongo shell is a wrapper for the addShard database command.
While sharding operates on a per-collection basis, you must enable sharding for each database that holds collections you want to shard. A single cluster may have many databases, with each database housing collections.
Use the following operation in a mongo shell session connected to a mongos instance in your cluster:
sh.enableSharding("records")
Where records is the name of the database that holds the collection you want to shard. sh.enableSharding() is a wrapper around the enableSharding database command. You can enable sharding for multiple databases in the cluster.
You can enable sharding on a per-collection basis. Because MongoDB uses “range based sharding,” you must specify the shard key MongoDB uses to distribute your documents among the shards. For more information, see the overview of shard keys.
To enable sharding for a collection, use the sh.shardCollection() helper in the mongo shell. The helper provides a wrapper around the shardCollection database command and has the following prototype form:
sh.shardCollection("<database>.<collection>", shard-key-pattern)
Replace the <database>.<collection> string with the full namespace of your database, which consists of the name of your database, a dot (e.g. .), and the full name of the collection. The shard-key-pattern represents your shard key, which you specify in the same form as you would an index key pattern.
Consider the following example invocations of sh.shardCollection():
sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } )
sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } )
sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } )
sh.shardCollection("events.alerts", { "hashed_id": 1 } )
In order, these operations shard:
The people collection in the records database using the shard key { "zipcode": 1, "name": 1 }.
This shard key distributes documents by the value of the zipcode field. If a number of documents have the same value for this field, then that chunk will be splitable by the values of the name field.
The addresses collection in the people database using the shard key { "state": 1, "_id": 1 }.
This shard key distributes documents by the value of the state field. If a number of documents have the same value for this field, then that chunk will be splitable by the values of the _id field.
The chairs collection in the assets database using the shard key { "type": 1, "_id": 1 }.
This shard key distributes documents by the value of the type field. If a number of documents have the same value for this field, then that chunk will be splitable by the values of the _id field.
The alerts collection in the events database using the shard key { "hashed_id": 1 }.
This shard key distributes documents by the value of the hashed_id field. Presumably this is a computed value that holds the hash of some value in your documents and is able to evenly distribute documents throughout your cluster.
This document describes how to add a shard to an existing sharded cluster. As your data set grows you must add additional shards to a cluster to provide additional capacity. For additional sharding procedures, see Sharded Cluster Administration.
Distributing chunks among your cluster requires some capacity to support the migration process. When adding a shard to your cluster, you should always ensure that your cluster has enough capacity to support the migration without affecting legitimate production traffic.
In production environments, all shards should be replica sets. Furthermore, all interaction with your sharded cluster should pass through a mongos instance. This tutorial assumes that you already have a mongo shell connection to a mongos instance.
Tell the cluster where to find the individual shards. You can do this using the addShard command:
db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } )
Or you can use the sh.addShard() helper in the mongo shell:
sh.addShard( "[hostname]:[port]" )
Replace [hostname] and [port] with the hostname and TCP port number of where the shard is accessible.
Warning
Do not use localhost for the hostname unless your configuration server is also running on localhost.
For example:
sh.addShard( "mongodb0.example.net:27027" )
If mongodb0.example.net:27027 is a member of a replica set, call the sh.addShard() method with an argument that resembles the following:
sh.addShard( "<setName>/mongodb0.example.net:27027" )
Replace, <setName> with the name of the replica set, and MongoDB will discover all other members of the replica set.
Note
In production deployments, all shards should be replica sets.
Changed in version 2.0.3.
Before version 2.0.3, you must specify the shard in the following form:
replicaSetName/<seed1>,<seed2>,<seed3>
For example, if the name of the replica set is repl0, then your sh.addShard command would be:
sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" )
Repeat this step for each shard in your cluster.
Optional
You may specify a “name” as an argument to the addShard, follows:
db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } )
You cannot specify a name for a shard using the sh.addShard() helper in the mongo shell. If you use the helper or do not specify a shard name, then MongoDB will assign a name upon creation.
Note
It may take some time for chunks to migrate to the new shard because the system must copy data from one mongod instance to another while maintaining data consistency.
For an overview of the balancing operation, see the Balancing and Distribution section.
For additional information on balancing, see the Balancing Internals section.
This procedure describes the procedure for migrating data from a shard safely, when you need to decommission a shard. You may also need to remove shards as part of hardware reorganization and data migration.
Do not use this procedure to migrate an entire cluster to new hardware. To migrate an entire shard to new hardware, migrate individual shards as if they were independent replica sets.
To remove a shard, you will:
Complete this procedure by connecting to any mongos in the cluster using the mongo shell.
You can only remove a shard by its shard name. To discover or confirm the name of a shard, use the listShards command, printShardingStatus command, or sh.status() shell helper.
The example commands in this document remove a shard named mongodb0.
Note
To successfully migrate data from a shard, the balancer process must be active. Check the balancer state using the sh.getBalancerState() helper in the mongo shell. For more information, see the section on balancer operations.
Start by running the removeShard command. This begins “draining” chunks from the shard you are removing.
db.runCommand( { removeshard: "mongodb0" } )
This operation returns immediately, with the following response:
{ msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 }
Depending on your network capacity and the amount of data in your cluster, this operation can take from a few minutes to several days to complete.
To check the progress of the migration, run removeShard again at any stage of the process, as follows:
db.runCommand( { removeshard: "mongodb0" } )
The output resembles the following document:
{ msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 }
In the remaining sub document, a counter displays the remaining number of chunks that MongoDB must migrate to other shards and the number of MongoDB databases that have “primary” status on this shard.
Continue checking the status of the removeshard command until the number of chunks remaining is 0. Then proceed to the next step.
Databases with non-sharded collections store those collections on a single shard known as the primary shard for that database. The following step is necessary only when the shard to remove is also the primary shard for one or more databases.
Issue the following command at the mongo shell:
db.runCommand( { movePrimary: "myapp", to: "mongodb1" })
This command migrates all remaining non-sharded data in the database named myapp to the shard named mongodb1.
Warning
Do not run the movePrimary until you have finished draining the shard.
This command will not return until MongoDB completes moving all data, which may take a long time. The response from this command will resemble the following:
{ "primary" : "mongodb1", "ok" : 1 }
Run removeShard again to clean up all metadata information and finalize the removal, as follows:
db.runCommand( { removeshard: "mongodb0" } )
A success message appears at completion:
{ msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 }
When the value of “state” is “completed”, you may safely stop the mongodb0 shard.
The unique constraint on indexes ensures that only one document can have a value for a field in a collection. For sharded collections these unique indexes cannot enforce uniqueness because insert and indexing operations are local to each shard. [1]
If your need to ensure that a field is always unique in all collections in a sharded environment, there are two options:
Enforce uniqueness of the shard key.
MongoDB can enforce uniqueness for the shard key. For compound shard keys, MongoDB will enforce uniqueness on the entire key combination, and not for a specific component of the shard key.
Use a secondary collection to enforce uniqueness.
Create a minimal collection that only contains the unique field and a reference to a document in the main collection. If you always insert into a secondary collection before inserting to the main collection, MongoDB will produce an error if you attempt to use a duplicate key.
Note
If you have a small data set, you may not need to shard this collection and you can create multiple unique indexes. Otherwise you can shard on a single unique key.
Regardless of method, be aware that writes to the MongoDB database are “fire and forget,” or “unsafe” by default: they will not return errors to the client if MongoDB rejects a write operation because of a duplicate key or other error. As a result if you want to enforce unique keys you must use the safe write setting in your driver. See your driver’s documentation on getLastError for more information.
| [1] | If you specify a unique index on a sharded collection, MongoDB will be able to enforce uniqueness only among the documents located on a single shard at the time of creation. |
To shard a collection using the unique constraint, specify the shardCollection command in the following form:
db.runCommand( { shardCollection : "test.users" , key : { email : 1 } , unique : true } );
Remember that the _id field index is always unique. By default, MongoDB inserts an ObjectId into the _id field. However, you can manually insert your own value into the _id field and use this as the shard key. To use the _id field as the shard key, use the following operation:
db.runCommand( { shardCollection : "test.users" } )
Warning
In any sharded collection where you are not sharding by the _id field, you must ensure uniqueness of the _id field. The best way to ensure _id is always unique is to use ObjectId, or another universally unique identifier (UUID.)
In most cases, the best shard keys are compound keys that include elements that permit write scaling and query isolation, as well as high cardinality. These ideal shard keys are not often the same keys that require uniqueness and requires a different approach.
If you cannot use a unique field as the shard key or if you need to enforce uniqueness over multiple fields, you must create another collection to act as a “proxy collection”. This collection must contain both a reference to the original document (i.e. its ObjectId) and the unique key.
If you must shard this “proxy” collection, then shard on the unique key using the above procedure; otherwise, you can simply create multiple unique indexes on the collection.
Consider the following for the “proxy collection:”
{
"_id" : ObjectId("...")
"email" ": "..."
}
The _id field holds the ObjectId of the document it reflects, and the email field is the field on which you want to ensure uniqueness.
To shard this collection, use the following operation using the email field as the shard key:
db.runCommand( { shardCollection : "records.proxy" , key : { email : 1 } , unique : true } );
If you do not need to shard the proxy collection, use the following command to create a unique index on the email field:
db.proxy.ensureIndex( { "email" : 1 }, {unique : true} )
You may create multiple unique indexes on this collection if you do not plan to shard the proxy collection.
To insert documents, use the following procedure in the JavaScript shell:
use records;
var primary_id = ObjectId();
db.proxy.insert({
"_id" : primary_id
"email" : "example@example.net"
})
// if: the above operation returns successfully,
// then continue:
db.information.insert({
"_id" : primary_id
"email": "example@example.net"
// additional information...
})
You must insert a document into the proxy collection first. If this operation succeeds, the email field is unique, and you may continue by inserting the actual document into the information collection.
See
The full documentation of: db.collection.ensureIndex() and shardCollection.
Following this tutorial, you will convert a single 3-member replica set to a cluster that consists of 2 shards. Each shard will consist of an independent 3-member replica set.
The tutorial uses a test environment running on a local system UNIX-like system. You should feel encouraged to “follow along at home.” If you need to perform this process in a production environment, notes throughout the document indicate procedural differences.
The procedure, from a high level, is as follows:
Install MongoDB according to the instructions in the MongoDB Installation Tutorial.
If have an existing MongoDB replica set deployment, you can omit the this step and continue from Deploy Sharding Infrastructure.
Use the following sequence of steps to configure and deploy a replica set and to insert test data.
Create the following directories for the first replica set instance, named firstset:
To create directories, issue the following command:
mkdir -p /data/example/firstset1 /data/example/firstset2 /data/example/firstset3
In a separate terminal window or GNU Screen window, start three mongod instances by running each of the following commands:
mongod --dbpath /data/example/firstset1 --port 10001 --replSet firstset --oplogSize 700 --rest
mongod --dbpath /data/example/firstset2 --port 10002 --replSet firstset --oplogSize 700 --rest
mongod --dbpath /data/example/firstset3 --port 10003 --replSet firstset --oplogSize 700 --rest
Note
The --oplogSize 700 option restricts the size of the operation log (i.e. oplog) for each mongod instance to 700MB. Without the --oplogSize option, each mongod reserves approximately 5% of the free disk space on the volume. By limiting the size of the oplog, each instance starts more quickly. Omit this setting in production environments.
In a mongo shell session in a new terminal, connect to the mongodb instance on port 10001 by running the following command. If you are in a production environment, first read the note below.
mongo localhost:10001/admin
Note
Above and hereafter, if you are running in a production environment or are testing this process with mongod instances on multiple systems, replace “localhost” with a resolvable domain, hostname, or the IP address of your system.
In the mongo shell, initialize the first replica set by issuing the following command:
db.runCommand({"replSetInitiate" :
{"_id" : "firstset", "members" : [{"_id" : 1, "host" : "localhost:10001"},
{"_id" : 2, "host" : "localhost:10002"},
{"_id" : 3, "host" : "localhost:10003"}
]}})
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
In the mongo shell, create and populate a new collection by issuing the following sequence of JavaScript operations:
use test
switched to db test
people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina", "Katie", "Jeff"];
for(var i=0; i<1000000; i++){
name = people[Math.floor(Math.random()*people.length)];
user_id = i;
boolean = [true, false][Math.floor(Math.random()*2)];
added_at = new Date();
number = Math.floor(Math.random()*10001);
db.test_collection.save({"name":name, "user_id":user_id, "boolean": boolean, "added_at":added_at, "number":number });
}
The above operations add one million documents to the collection test_collection. This can take several minutes, depending on your system.
The script adds the documents in the following form:
{ "_id" : ObjectId("4ed5420b8fc1dd1df5886f70"), "name" : "Greg", "user_id" : 4, "boolean" : true, "added_at" : ISODate("2011-11-29T20:35:23.121Z"), "number" : 74 }
This procedure creates the three config databases that store the cluster’s metadata.
Note
For development and testing environments, a single config database is sufficient. In production environments, use three config databases. Because config instances store only the metadata for the sharded cluster, they have minimal resource requirements.
Create the following data directories for three config database instances:
Issue the following command at the system prompt:
mkdir -p /data/example/config1 /data/example/config2 /data/example/config3
In a separate terminal window or GNU Screen window, start the config databases by running the following commands:
mongod --configsvr --dbpath /data/example/config1 --port 20001
mongod --configsvr --dbpath /data/example/config2 --port 20002
mongod --configsvr --dbpath /data/example/config3 --port 20003
In a separate terminal window or GNU Screen window, start mongos instance by running the following command:
mongos --configdb localhost:20001,localhost:20002,localhost:20003 --port 27017 --chunkSize 1
Note
If you are using the collection created earlier or are just experimenting with sharding, you can use a small --chunkSize (1MB works well.) The default chunkSize of 64MB means that your cluster must have 64MB of data before the MongoDB’s automatic sharding begins working.
In production environments, do not use a small shard size.
The configdb options specify the configuration databases (e.g. localhost:20001, localhost:20002, and localhost:2003). The mongos instance runs on the default “MongoDB” port (i.e. 27017), while the databases themselves are running on ports in the 30001 series. In the this example, you may omit the --port 27017 option, as 27017 is the default port.
Add the first shard in mongos. In a new terminal window or GNU Screen session, add the first shard, according to the following procedure:
Connect to the mongos with the following command:
mongo localhost:27017/admin
Add the first shard to the cluster by issuing the addShard command:
db.runCommand( { addShard : "firstset/localhost:10001,localhost:10002,localhost:10003" } )
Observe the following message, which denotes success:
{ "shardAdded" : "firstset", "ok" : 1 }
This procedure deploys a second replica set. This closely mirrors the process used to establish the first replica set above, omitting the test data.
Create the following data directories for the members of the second replica set, named secondset:
In three new terminal windows, start three instances of mongod with the following commands:
mongod --dbpath /data/example/secondset1 --port 10004 --replSet secondset --oplogSize 700 --rest
mongod --dbpath /data/example/secondset2 --port 10005 --replSet secondset --oplogSize 700 --rest
mongod --dbpath /data/example/secondset3 --port 10006 --replSet secondset --oplogSize 700 --rest
Note
As above, the second replica set uses the smaller oplogSize configuration. Omit this setting in production environments.
In the mongo shell, connect to one mongodb instance by issuing the following command:
mongo localhost:10004/admin
In the mongo shell, initialize the second replica set by issuing the following command:
db.runCommand({"replSetInitiate" :
{"_id" : "secondset",
"members" : [{"_id" : 1, "host" : "localhost:10004"},
{"_id" : 2, "host" : "localhost:10005"},
{"_id" : 3, "host" : "localhost:10006"}
]}})
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
Add the second replica set to the cluster. Connect to the mongos instance created in the previous procedure and issue the following sequence of commands:
use admin
db.runCommand( { addShard : "secondset/localhost:10004,localhost:10005,localhost:10006" } )
This command returns the following success message:
{ "shardAdded" : "secondset", "ok" : 1 }
Verify that both shards are properly configured by running the listShards command. View this and example output below:
db.runCommand({listShards:1})
{
"shards" : [
{
"_id" : "firstset",
"host" : "firstset/localhost:10001,localhost:10003,localhost:10002"
},
{
"_id" : "secondset",
"host" : "secondset/localhost:10004,localhost:10006,localhost:10005"
}
],
"ok" : 1
}
MongoDB must have sharding enabled on both the database and collection levels.
Issue the enableSharding command. The following example enables sharding on the “test” database:
db.runCommand( { enableSharding : "test" } )
{ "ok" : 1 }
MongoDB uses the shard key to distribute documents between shards. Once selected, you cannot change the shard key. Good shard keys:
Typically shard keys are compound, comprising of some sort of hash and some sort of other primary key. Selecting a shard key depends on your data set, application architecture, and usage pattern, and is beyond the scope of this document. For the purposes of this example, we will shard the “number” key. This typically would not be a good shard key for production deployments.
Create the index with the following procedure:
use test
db.test_collection.ensureIndex({number:1})
See also
The Shard Key Overview and Shard Key sections.
Issue the following command:
use admin
db.runCommand( { shardCollection : "test.test_collection", key : {"number":1} })
{ "collectionsharded" : "test.test_collection", "ok" : 1 }
The collection test_collection is now sharded!
Over the next few minutes the Balancer begins to redistribute chunks of documents. You can confirm this activity by switching to the test database and running db.stats() or db.printShardingStatus().
As clients insert additional documents into this collection, mongos distributes the documents evenly between the shards.
In the mongo shell, issue the following commands to return statics against each cluster:
use test
db.stats()
db.printShardingStatus()
Example output of the db.stats() command:
{
"raw" : {
"firstset/localhost:10001,localhost:10003,localhost:10002" : {
"db" : "test",
"collections" : 3,
"objects" : 973887,
"avgObjSize" : 100.33173458522396,
"dataSize" : 97711772,
"storageSize" : 141258752,
"numExtents" : 15,
"indexes" : 2,
"indexSize" : 56978544,
"fileSize" : 1006632960,
"nsSizeMB" : 16,
"ok" : 1
},
"secondset/localhost:10004,localhost:10006,localhost:10005" : {
"db" : "test",
"collections" : 3,
"objects" : 26125,
"avgObjSize" : 100.33286124401914,
"dataSize" : 2621196,
"storageSize" : 11194368,
"numExtents" : 8,
"indexes" : 2,
"indexSize" : 2093056,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"ok" : 1
}
},
"objects" : 1000012,
"avgObjSize" : 100.33176401883178,
"dataSize" : 100332968,
"storageSize" : 152453120,
"numExtents" : 23,
"indexes" : 4,
"indexSize" : 59071600,
"fileSize" : 1207959552,
"ok" : 1
}
Example output of the db.printShardingStatus() command:
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "firstset", "host" : "firstset/localhost:10001,localhost:10003,localhost:10002" }
{ "_id" : "secondset", "host" : "secondset/localhost:10004,localhost:10006,localhost:10005" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "firstset" }
test.test_collection chunks:
secondset 5
firstset 186
[...]
In a few moments you can run these commands for a second time to demonstrate that chunks are migrating from firstset to secondset.
When this procedure is complete, you will have converted a replica set into a cluster where each shard is itself a replica set.
If your sharded cluster comprises a small collection of data, you can connect to a mongos and issue the mongodump command. You can use this approach if the following is true:
It’s possible to store the entire backup on one system or on a single storage device. Consider both backups of entire instances and incremental dumps of data.
The state of the database at the beginning of the operation is not significantly different than the state of the database at the end of the backup.
Your application must be able to operate given a copy of the data that reflects many moments in time. If the data collected by the backup operation is not sufficient to restore from, using mongodump for your back ups is not viable.
The backup can run and complete without affecting the performance of the cluster.
If these conditions are not all true, then this backup method will not support the needs of your deployment. Read Sharded Cluster Backup Considerations for a high-level overview of important considerations as well as a list of alternate backup tutorials.
Note
If you use mongodump without specifying the a database or collection, the output will contain both the collection data and the sharding config metadata from the config servers.
You cannot use the --oplog option for mongodump when dumping from a mongos. This option is only available when running directly against a replica set member.
To perform a backup of a shard cluster by connecting mongodump by connecting directly, use the following operation a your system’s prompt:
mongodump --journal --host mongos3.example.net --port 27017
This will create a database dump of the shard cluster accessible via the mongos listening on port 27017 of the mongos3.example.net mongos instance.
The dump produced by this operation will effectively “unshard” your data: you must re-shard and re-balance the data when you restore.
This document describes a procure for taking a backup of all components of a sharded cluster. This procedure uses file system snapshots to capture a copy of the mongod instance. An alternate procedure that uses mongodump to create binary database dumps when file-system snapshots are not available. See Create Backup of a Sharded Cluster with Database Dumps for the alternate procedure.
See Sharded Cluster Backup Considerations for a full higher level overview backing up a sharded cluster as well as links to other tutorials that provide alternate procedures.
Important
To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a running production system, you can only capture an approximation of point-in-time snapshot.
In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using a file-system snapshot tool. If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesysem snapshots; otherwise the snapshot will only approximate a moment of in time.
For approximate point-in-time snapshots, you can improve the quality of the backup while minimizing impact on the cluster by taking the backup from a secondary member of the replica set that provides each shard.
Disable the balancer process that equalizes the distribution of data among the shards. To disable the balancer, use the sh.stopBalancer() method in the mongo shell, and see the Disable the Balancer procedure.
Warning
It is essential that you stop the balancer before creating backups. If the balancer remains active, your resulting backups could have duplicate data or miss some data, as chunks migrate while recording backups.
Lock one member of each replica set in each shard so that your backups reflect the state of your database at the nearest possible approximation of a single moment in time. Lock these mongod instances in as short of an interval as possible.
To lock or freeze a sharded cluster, you must:
Use mongodump to backup one of the config servers. This backs up the cluster’s metadata. You only need to back up one config server, as they all hold the same data.
Issue this command against one of the config mongod instances or via the mongos:
mongodump --db config
Back up the replica set members of the shards that you locked. You may back up the shards in parallel. For each shard, create a snapshot. Use the procedures in Using Block Level Backup Methods.
Unlock all locked replica set members of each shard using the db.fsyncUnlock() method in the mongo shell.
Restore the balancer with the sh.startBalancer() method according to the Disable the Balancer procedure.
Use the following command sequence when connected to the mongos with the mongo shell:
use config
sh.startBalancer()
This document describes a procedure for taking a backup of all components of a sharded cluster. This procedure uses mongodump to create dumps of the mongod instance. An alternate procedure uses file system snapshots to capture the backup data, and may be more efficient in some situations if your system configuration allows file system backups. See Create Backup of a Sharded Cluster with Filesystem Snapshots.
See Sharded Cluster Backup Considerations for a full higher level overview of backing up a sharded cluster as well as links to other tutorials that provide alternate procedures.
Important
To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a running production system, you can only capture an approximation of point-in-time snapshot.
In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup data. If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesysem snapshots; otherwise the snapshot will only approximate a moment of time.
For approximate point-in-time snapshots, you can improve the quality of the backup while minimizing impact on the cluster by taking the backup from a secondary member of the replica set that provides each shard.
Disable the balancer process that equalizes the distribution of data among the shards. To disable the balancer, use the sh.stopBalancer() method in the mongo shell, and see the Disable the Balancer procedure.
Warning
It is essential that you stop the balancer before creating backups. If the balancer remains active, your resulting backups could have duplicate data or miss some data, as chunks migrate while recording backups.
Lock one member of each replica set in each shard so that your backups reflect the state of your database at the nearest possible approximation of a single moment in time. Lock these mongod instances in as short of an interval as possible.
To lock or freeze a sharded cluster, you must:
Shutdown one member of each replica replica set.
Ensure that the oplog has sufficient capacity to allow these secondaries to catch up to the state of the primaries after finishing the backup procedure. See Oplog for more information.
Shutdown one of the config servers, to prevent all metadata changes during the backup process.
Use mongodump to backup one of the config servers. This backs up the cluster’s metadata. You only need to back up one config server, as they all hold the same data.
Issue this command against one of the config mongod instances or via the mongos:
mongodump --journal --db config
Back up the replica set members of the shards that shut down using mongodump and specifying the --dbpath option. You may back up the shards in parallel. Consider the following invocation:
mongodump --journal --dbpath /data/db/ --out /data/backup/
You must run this command on the system where the mongod ran. This operation will use journaling and create a dump of the entire mongod instance with data files stored in /data/db/. mongodump will write the output of this dump to the /data/backup/ directory.
Restart all stopped replica set members of each shard as normal and allow them to catch up wit hthe state of the primary.
Restore the balancer with the sh.startBalancer() method according to the Disable the Balancer procedure.
Use the following command sequence when connected to the mongos with the mongo shell:
use config
sh.startBalancer()
Restoring a single shard from backup with other unaffected shards requires a number of special considerations and practices. This document outlines the additional tasks you must perform when restoring a single shard.
Consider the following resources on backups in general as well as backup and restoration of sharded clusters specifically:
Always restore sharded clusters as a whole. When you restore a single shard, keep in mind that the balancer process might have moved chunks to or from this shard since the last backup. If that’s the case, you must manually move those chunks, as described in this procedure.
The procedure outlined in this document addresses how to restore an entire sharded cluster. For information on related backup procedures consider the following tutorials describe backup procedures in detail:
The exact procedure used to restore a database depends on the method used to capture the backup. See the Backup and Restoration Strategies document for an overview of backups with MongoDB, as well as Sharded Cluster Backup Considerations which provides an overview of the high level concepts important for backing up sharded clusters.
Stop all mongod and mongos processes.
If shard hostnames have changed, you must manually update the shards collection in the Config Database Contents to use the new hostnames. Do the following:
Start the three config servers by issuing commands similar to the following, using values appropriate to your configuration:
mongod --configsvr --dbpath /data/configdb --port 27018
Restore the Config Database Contents on each config server.
Start one mongos instance.
Update the Config Database Contents collection named shards to reflect the new hostnames.
Restore the following:
Restart the all the mongos instances.
Restart all the mongod instances.
Connect to a mongos instance from a mongo shell and run use the db.printShardingStatus() method to ensure that the cluster is operational, as follows:
db.printShardingStatus()
show collections
In a sharded cluster, the balancer process is responsible for distributing sharded data around the cluster, so that each shard has roughly the same amount of data.
However, when creating backups from a sharded cluster it’s important that you disable the balancer while taking backups to ensure that no chunk migrations affect the content of the backup captured by the backup procedure. Using the procedure outlined in the section Disable the Balancer you can stop the balancer process temporarily using a manual process. As an alternative you can use this procedure to define a balancing window so that the balancer is always during your automated backup operation.
If you have an automated backup schedule, you can disable all balancing operations for a period of time. For instance, consider the following command:
use config
db.settings.update( { _id : "balancer" }, { $set : { activeWindow : { start : "6:00", stop : "23:00" } } }, true )
This operation configures the balancer to run between 6:00 am and 11:00pm, server time. Schedule your backup operation to run and complete in this time. Ensure that the backup can complete during the window when the balancer is running and that the balancer can effectively balance the collection among the shards in the window allotted to each.
This page lists the core administrative documentation and the administration tutorials, and includes links to administrative documentation for replica sets, sharding, and indexing.
The following documents comprise the overview of core topics in MongoDB administration:
The command line and configuration file interfaces provide MongoDB administrators with a large number of options and settings for controlling the operation of the database system. This document provides an overview of common configurations and examples of best-practice configurations for common use cases.
While both interfaces provide access the same collection of options and settings, this document primarily uses the configuration file interface. If you run MongoDB using a control script or packaged for your operating system, you likely already have a configuration file located at /etc/mongodb.conf. Confirm this by checking the content of the /etc/init.d/mongod or /etc/rc.d/mongod script to insure that the control scripts start the mongod with the appropriate configuration file (see below.)
To start MongoDB instance using this configuration issue a command in the following form:
mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf
Modify the values in the /etc/mongodb.conf file on your system to control the configuration of your database instance.
Consider the following basic configuration:
fork = true
bind_ip = 127.0.0.1
port = 27017
quiet = true
dbpath = /srv/mongodb
logpath = /var/log/mongodb/mongod.log
logappend = true
journal = true
For most standalone servers, this is a sufficient base configuration. It makes several assumptions, but consider the following explanation:
fork” is true, which enables a daemon mode for mongod, which detaches (i.e. “forks”) the MongoDB from the current session and allows you to run the database as a conventional server.
bind_ip is 127.0.0.1, which forces the server to only listen for requests on the localhost IP. Only bind to secure interfaces that the application-level systems can access with access control provided by system network filtering (i.e. “firewall) systems.
port is 27017, which is the default MongoDB port for database instances. MongoDB can bind to any port. You can also filter access based on port using network filtering tools.
Note
UNIX-like systems require superuser privileges to attach processes to ports lower than 1000.
quiet is true. This disables all but the most critical entries in output/log file. In normal operation this is the preferable operation to avoid log noise. In diagnostic or testing situations, set this value to false. Use setParameter to modify this setting during run time.
dbpath is /srv/mongodb, which specifies where MongoDB will store its data files. /srv/mongodb and /var/lib/mongodb are popular locations. The user account that mongod runs under will need read and write access to this directory.
logpath is /var/log/mongodb/mongod.log which is where mongod will write its output. If you do not set this value, mongod writes all output to standard output (e.g. stdout.)
logappend is true, which ensures that mongod does not overwrite an existing log file following the server start operation.
Journaling ensures single instance write-durability. 64-bit builds of mongod enable journaling by default. Thus, this setting may be redundant.
Given the default configuration, some of these values may be redundant. However, in many situations explicitly stating the configuration increases overall system intelligibility.
The following collection of configuration options are useful for limiting access to a mongod instance. Consider the following:
bind_ip = 127.0.0.1,10.8.0.10,192.168.4.24
nounixsocket = true
auth = true
Consider the following explanation for these configuration decisions:
“bind_ip” has three values: 127.0.0.1, the localhost interface; 10.8.0.10, a private IP address typically used for local networks and VPN interfaces; and 192.168.4.24, a private network interface typically used for local networks.
Because production MongoDB instances need to be accessible from multiple database servers, it is important to bind MongoDB to multiple interfaces that are accessible from your application servers. At the same time it’s important to limit these interfaces to interfaces controlled and protected at the network layer.
“nounixsocket” is true which disables the UNIX Socket, which is otherwise enabled by default. This limits access on the local system. This is desirable when running MongoDB on with shared access, but in most situations has minimal impact.
“auth” is true which enables the authentication system within MongoDB. If enabled you will need to log in, by connecting over the localhost interface for the first time to create user credentials.
See also
Replica set configuration is straightforward, and only requires that the replSet have a value that is consistent among all members of the set. Consider the following:
replSet = set0
Use descriptive names for sets. Once configured use the mongo shell to add hosts to the replica set.
See also
To enable authentication for the replica set, add the following option:
keyFile = /srv/mongodb/keyfile
New in version 1.8: for replica sets, and 1.9.1 for sharded replica sets.
Setting keyFile enables authentication and specifies a key file for the replica set member use to when authenticating to each other. The content of the key file is arbitrary, but must be the same on all members of the replica set and mongos instances that connect to the set. The keyfile must be less one kilobyte in size and may only contain characters in the base64 set and file must not have group or “world” permissions on UNIX systems.
See also
The “Replica set Reconfiguration section for information regarding the process for changing replica set during operation.
Additionally, consider the “Replica Set Security” section for information on configuring authentication with replica sets.
Finally, see the “Replication” index and the “Replication Fundamentals” document for more information on replication in MongoDB and replica set configuration in general.
Sharding requires a number of mongod instances with different configurations. The config servers store the cluster’s metadata, while the cluster distributes data among one or more shard servers.
Note
Config servers are not replica sets.
To set up one or three “config server” instances as normal mongod instances, and then add the following configuration option:
configsvr = true
bind_ip = 10.8.0.12
port = 27001
This creates a config server running on the private IP address 10.8.0.12 on port 27001. Make sure that there are no port conflicts, and that your config server is accessible from all of your “mongos” and “mongod” instances.
To set up shards, configure two or more mongod instance using your base configuration, adding the shardsvr setting:
shardsvr = true
Finally, to establish the cluster, configure at least one mongos process with the following settings:
configdb = 10.8.0.12:27001
chunkSize = 64
You can specify multiple configdb instances by specifying hostnames and ports in the form of a comma separated list. In general, avoid modifying the chunkSize from the default value of 64, [1] and should ensure this setting is consistent among all mongos instances.
| [1] | Chunk size is 64 megabytes by default, which provides the ideal balance between the most even distribution of data, for which smaller chunk sizes are best, and minimizing chunk migration, for which larger chunk sizes are optimal. |
See also
The “Sharding” section of the manual for more information on sharding and cluster configuration.
In many cases running multiple instances of mongod on a single system is not recommended, on some types of deployments [2] and for testing purposes you may need to run more than one mongod on a single system.
In these cases, use a base configuration for each instance, but consider the following configuration values:
dbpath = /srv/mongodb/db0/
pidfilepath = /srv/mongodb/db0.pid
The dbpath value controls the location of the mongod instance’s data directory. Ensure that each database has a distinct and well labeled data directory. The pidfilepath controls where mongod process places it’s pid file. As this tracks the specific mongod file, it is crucial that file be unique and well labeled to make it easy to start and stop these processes.
Create additional control scripts and/or adjust your existing MongoDB configuration and control script as needed to control these processes.
| [2] | Single-tenant systems with SSD or other high performance disks may provide acceptable performance levels for multiple mongod instances. Additionally, you may find that multiple databases with small working sets may function acceptably on a single system. |
The following configuration options control various mongod behaviors for diagnostic purposes. The following settings have default values that tuned for general production purposes:
slowms = 50
profile = 3
verbose = true
diaglog = 3
objcheck = true
cpu = true
Use the base configuration and add these options if you are experiencing some unknown issue or performance problem as needed:
slowms configures the threshold for the database profiler to consider a query “slow.” The default value is 100 milliseconds. Set a lower value if the database profiler does not return useful results. See Optimization Strategies for MongoDB Applications for more information on optimizing operations in MongoDB.
profile sets the database profiler level. The profiler is not active by default because of the possible impact on the profiler itself on performance. Unless this setting has a value, queries are not profiled.
verbose enables a verbose logging mode that modifies mongod output and increases logging to include a greater number of events. Only use this option if you are experiencing an issue that is not reflected in the normal logging level. If you require additional verbosity, consider the following options:
v = true
vv = true
vvv = true
vvvv = true
vvvvv = true
Each additional level v adds additional verbosity to the logging. The verbose option is equal to v = true.
diaglog enables diagnostic logging. Level 3 logs all read and write options.
objcheck forces mongod to validate all requests from clients upon receipt. Use this option to ensure that invalid requests are not causing errors, particularly when running a database with untrusted clients. This option may affect database performance.
cpu forces mongod to report the percentage of the last interval spent in write-lock. The interval is typically 4 seconds, and each output line in the log includes both the actual interval since the last report and the percentage of time spent in write lock.
MongoDB uses write ahead logging or journaling to guarantee write operation durability by way of a an on disk journal. Before applying a change to the data files, MongoDB writes this operation to the journal. Then, if MongoDB terminates or encounters an error unexpectedly before it can write the data to disk, MongoDB can re-apply the write operation and maintain a consistent state.
Journaling ensures that mongodb is crash resilient. Without a journal, if mongodb exits unexpectedly, you must assume your data is in an inconsistent state and must either run repair or preferably resync from a clean member of the replica set.
When journaling is enabled, if mongodb stops unexpectedly, the program can recover everything written to the journal, and the data is in a consistent state. By default, the greatest extent of lost writes, i.e., those not made to the journal, is no more than the last 100 milliseconds.
With journaling, if you want a data set to reside entirely in RAM, you need enough RAM to hold the dataset plus the “write working set.” The “write working set” is the amount of unique data you expect to see written between re-mappings of the private view. For information on views, see Storage Views used in Journaling.
Important
Changed in version 2.0: For 64-bit builds of mongod, journaling is enabled by default. For other platforms, see journal.
Changed in version 2.0: For 64-bit builds of mongod, journaling is enabled by default.
To enable journaling, start mongod with the --journal command line option.
If no journal files exist, when mongod starts, it must preallocates new journal files. During this operation, the mongod is not listening for connections until preallocation completes: for some systems this may take a several minutes. During this period your applications and the mongo shell are not available.
Warning
Do not disable journaling on production systems. If your mongod instance stops without shutting down cleanly unexpectedly for any reason, (e.g. power failure) and you are not running with journaling, then you must recover from an unaffected replica set member or backup, as described in repair.
To disable journaling, start mongod with the --nojournal command line option.
To disable journaling, shut down mongod cleanly and restart with --nojournal.
You can get commit acknowledgement with the getLastError command and the j option. For details, see Internal Operation of Write Concern.
To avoid preallocation lag, you can preallocate files in the journal directory by copying them from another instance of mongod.
Preallocated files do not contain data. It is safe to later remove them. But if you restart mongod with journaling, mongod will create them again.
Example
The following sequence preallocates journal files for an instance of mongod running on port 27017 with a database path of /data/db.
For demonstration purposes, the sequence starts by creating a set of journal files in the usual way.
Create a temporary directory into which to create a set of journal files:
mkdir ~/tmpDbpath
Create a set of journal files by staring a mongod instance that uses the temporary directory:
mongod --port 10000 --dbpath ~/tmpDbpath --journal
When you see the following log output, indicating mongod has the files, press CONTROL+C to stop the mongod instance:
web admin interface listening on port 11000
Preallocate journal files for the new instance of mongod by moving the journal files from the data directory of the existing instance to the data directory of the new instance:
mv ~/tmpDbpath/journal /data/db/
Start the new mongod instance:
mongod --port 27017 --dbpath /data/db --journal
Use the following commands and methods to monitor journal status:
The serverStatus command returns database status information that is useful for assessing performance.
Use journalLatencyTest to measure how long it takes on your volume to write to the disk in an append-only fashion. You can run this command on an idle system to get a baseline sync time for journaling. You can also run this command on a busy system to see the sync time on a busy system, which may be higher if the journal directory is on the same volume as the data files.
The journalLatencyTest command also provides a way to check if your disk drive is buffering writes in its local cache. If the number is very low (i.e., less than 2 milliseconds) and the drive is non-SSD, the drive is probably buffering writes. In that case, enable cache write-through for the device in your operating system, unless you have a disk controller card with battery backed RAM.
Changed in version 2.0.
You can set the group commit interval using the --journalCommitInterval command line option. The allowed range is 2 to 300 milliseconds.
Lower values increase the durability of the journal at the expense of disk performance.
On a restart after a crash, MongoDB replays all journal files in the journal directory before the server becomes available. If MongoDB must replay journal files, mongod notes these events in the log output.
There is no reason to run repair in these situations.
When running with journaling, MongoDB stores and applies write operations in memory and in the journal before the changes are in the data files.
With journaling enabled, MongoDB creates a journal directory within the directory defined by dbpath, which is /data/db by default. The journal directory holds journal files, which contain write-ahead redo logs. The directory also holds a last-sequence-number file. A clean shutdown removes all the files in the journal directory.
Journal files are append-only files and have file names prefixed with j._. When a journal file holds 1 gigabyte of data, MongoDB creates a new journal file. Once MongoDB applies all the write operations in the journal files, it deletes these files. Unless you write many bytes of data per-second, the journal directory should contain only two or three journal files.
To limit the size of each journal file to 128 megabytes, use the :setting`smallfiles` run time option when starting mongod.
To speed the frequent sequential writes that occur to the current journal file, you can ensure that the journal directory is on a different system.
Important
If you place the journal on a different filesystem from your data files you cannot use a filesystem snapshot to capture consistent backups of a dbpath directory.
Note
Depending on your file system, you might experience a preallocation lag the first time you start a mongod instance with journaling enabled. MongoDB preallocates journal files if it is faster on your file system to create files of a pre-defined. The amount of time required to pre-allocate lag might last several minutes, during which you will not be able to connect to the database. This is a one-time preallocation and does not occur with future invocations.
To avoid preallocation lag, see Avoid Preallocation Lag.
Journaling adds three storage views to MongoDB.
The shared view stores modified data for upload to the MongoDB data files. The shared view is the only view with direct access to the MongoDB data files. When running with journaling, mongod asks the operating system to map your existing on-disk data files to the shared view memory view. The operating system maps the files but does not load them. MongoDB later loads data files to shared view as needed.
The private view stores data for use in read operations. MongoDB maps private view to the shared view and is the first place MongoDB applies new write operations.
The journal is an on-disk view that stores new write operations after MongoDB applies the operation to the private cache but before applying them to the data files. The journal provides durability. If the mongod instance were to crash without having applied the writes to the data files, the journal could replay the writes to the shared view for eventual upload to the data files.
MongoDB copies the write operations to the journal in batches called group commits. By default, MongoDB performs a group commit every 100 milliseconds: as a result MongoDB commits all operations within a 100 millisecond window in a single batch. These “group commits” help minimize the performance impact of journaling.
Journaling stores raw operations that allow MongoDB to reconstruct the following:
As write operations occur, MongoDB writes the data to the private view in RAM and then copies the write operations in batches to the journal. The journal stores the operations on disk to ensure durability. MongoDB adds the operations as entries on the journal’s forward pointer. Each entry describes which bytes the write operation changed in the data files.
MongoDB next applies the journal’s write operations to the shared view. At this point, the shared view becomes inconsistent with the data files.
At default intervals of 60 seconds, MongoDB asks the operating system to flush the shared view to disk. This brings the data files up-to-date with the latest write operations.
When MongoDB flushes write operations to the data files, MongoDB removes the write operations from the journal’s behind pointer. The behind pointer is always far back from advanced pointer.
As part of journaling, MongoDB routinely asks the operating system to remap the shared view to the private view, for consistency.
Note
The interaction between the shared view and the on-disk data files is similar to how MongoDB works without journaling, which is that MongoDB asks the operating system to flush in-memory changes back to the data files every 60 seconds.
This document outlines the use and operation of MongoDB’s SSL support. SSL, allows MongoDB clients to support encrypted connections to mongod instances.
Note
The default distribution of MongoDB does not contain support for SSL.
As of the current release, to use SSL you must either: build MongoDB locally passing the “--ssl” option to scons, or use the MongoDB subscriber build.
These instructions outline the process for getting started with SSL and assume that you have already installed a build of MongoDB that includes SSL support and that your client driver supports SSL.
Add the following command line options to your mongod invocation:
mongod --sslOnNormalPorts --sslPEMKeyFile <pem> --sslPEMKeyPassword <pass>
Replace “<pem>” with the path to your SSL certificate .pem file, and “<pass>” with the password you used to encrypt the .pem file.
You may also specify these options in your “mongodb.conf” file with following options:
sslOnNormalPorts = true
sslPEMKeyFile = /etc/ssl/mongodb.pem
sslPEMKeyPassword = pass
Modify these values to reflect the location of your actual .pem file and its password.
You can specify these configuration options in a configuration file for mongos, or start mongos with the following invocation:
mongos --sslOnNormalPorts --sslPEMKeyFile <pem> --sslPEMKeyPassword <pass>
You can use any existing SSL certificate, or you can generate your own SSL certificate using a command that resembles the following:
cd /etc/ssl/
openssl req -new -x509 -days 365 -nodes -out mongodb-cert.pem -keyout mongodb-cert.key
To create the combined .pem file that contains the .key file and the .pem certificate, use the following command:
cat mongodb-cert.key mongodb-cert.pem > mongodb.pem
Clients must have support for SSL to work with a mongod instance that has SSL support enabled. The current versions of the Python, Java, Ruby, and Node.js drivers have support for SSL, with full support coming in future releases of other drivers.
The mongo shell built with ssl support distributed with the subscriber build also supports SSL. Use the “--ssl” flag as follows:
mongo --ssl --host <host>
The MMS agent will also have to connect via SSL in order to gather its stats. Because the agent already utilizes SSL for its communications to the MMS servers, this is just a matter of enabling SSL support in MMS itself on a per host basis.
Use the “Edit” host button (i.e. the pencil) on the Hosts page in the MMS console and is currently enabled on a group by group basis by 10gen.
Please see the MMS Manual for more information about MMS configuration.
Add the “ssl=True” parameter to a PyMongo py:module:connection <pymongo:pymongo.connection> to create a MongoDB connection to an SSL MongoDB instance:
from pymongo import Connection
c = Connection(host="mongodb.example.net", port=27017, ssl=True)
To connect to a replica set, use the following operation:
from pymongo import ReplicaSetConnection
c = ReplicaSetConnection("mongodb.example.net:27017",
replicaSet="mysetname", ssl=True)
PyMongo also supports an “ssl=true” option for the MongoDB URI:
mongodb://mongodb.example.net:27017/?ssl=true
Consider the following example “sslApp.java” class file:
import com.mongodb.*;
import javax.net.ssl.SSLContext;
public class sslApp {
public static void main(String args[])
throws Exception {
MongoOptions o = new MongoOptions();
o.socketFactory = SSLSocketFactory.getDefault();
Mongo m = new Mongo( "localhost" , o );
DB db = m.getDB( "test" );
DBCollection c = db.getCollection( "foo" );
System.out.println( c.findOne() );
}
}
The recent versions version of the Ruby driver have support for connections to SSL servers. Install the latest version of the driver with the following command:
gem install mongo
Then connect to a standalone instance, using the following form:
require 'rubygems'
require 'mongo'
connection = Mongo::Connection.new('localhost', 27017, :ssl => true)
Replace connection with the following if you’re connecting to a replica set:
connection = Mongo::ReplSetConnection.new(['localhost:27017'],
['localhost:27018'],
:ssl => true)
Here, mongod instance run on “localhost:27017” and “localhost:27018”.
In the node-mongodb-native driver, use the following invocation to connect to a mongod or mongos instance via SSL:
var db1 = new Db(MONGODB, new Server("127.0.0.1", 27017,
{ auto_reconnect: false, poolSize:4, ssl:ssl } );
To connect to a replica set via SSL, use the following form:
var replSet = new ReplSetServers( [
new Server( RS.host, RS.ports[1], { auto_reconnect: true } ),
new Server( RS.host, RS.ports[0], { auto_reconnect: true } ),
],
{rs_name:RS.name, ssl:ssl}
);
As of release 1.6, the .NET driver supports SSL connections with mongod and mongos instances. To connect using SSL, you must add an option to the connection string, specifying ssl=true as follows:
var connectionString = "mongodb://localhost/?ssl=true";
var server = MongoServer.Create(connectionString);
The .NET driver will validate the certificate against the local trusted certificate store, in addition to providing encryption of the server. This behavior may produce issues during testing, if the server uses a self-signed certificate. If you encounter this issue, add the sslverifycertificate=false option to the connection string to prevent the .NET driver from validating the certificate, as follows:
var connectionString = "mongodb://localhost/?ssl=true&sslverifycertificate=false";
var server = MongoServer.Create(connectionString);
Monitoring is a critical component of all database administration. A firm grasp of MongoDB’s reporting will allow you to assess the state of your database and maintain your deployment without crisis. Additionally, a sense of MongoDB’s normal operational parameters will allow you to diagnose issues as you encounter them, rather than waiting for a crisis or failure.
This document provides an overview of the available tools and data provided by MongoDB as well as introduction to diagnostic strategies, and suggestions for monitoring instances in MongoDB’s replica sets and sharded clusters.
Note
10gen provides a hosted monitoring service which collects and aggregates these data to provide insight into the performance and operation of MongoDB deployments. See the MongoDB Monitoring Service (MMS) and the MMS documentation for more information.
There are two primary methods for collecting data regarding the state of a running MongoDB instance. First, there are a set of tools distributed with MongoDB that provide real-time reporting of activity on the database. Second, several database commands return statistics regarding the current database state with greater fidelity. Both methods allow you to collect data that answers a different set of questions, and are useful in different contexts.
This section provides an overview of these utilities and statistics, along with an example of the kinds of questions that each method is most suited to help you address.
The MongoDB distribution includes a number of utilities that return statistics about instances’ performance and activity quickly. These are typically most useful for diagnosing issues and assessing normal operation.
mongotop tracks and reports the current read and write activity of a MongoDB instance. mongotop provides per-collection visibility into use. Use mongotop to verify that activity and use match expectations. See the mongotop manual for details.
mongostat captures and returns counters of database operations. mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use mongostat to understand the distribution of operation types and to inform capacity planning. See the mongostat manual for details.
MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default configurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017
MongoDB provides a number of commands that return statistics about the state of the MongoDB instance. These data may provide finer granularity regarding the state of the MongoDB instance than the tools above. Consider using their output in scripts and programs to develop custom alerts, or modifying the behavior of your application in response to the activity of your instance.
Access serverStatus data by way of the serverStatus command. This document contains a general overview of the state of the database, including disk usage, memory use, connection, journaling, index accesses. The command returns quickly and does not impact MongoDB performance.
While this output contains a (nearly) complete account of the state of a MongoDB instance, in most cases you will not run this command directly. Nevertheless, all administrators should be familiar with the data provided by serverStatus.
See also
View the replSetGetStatus data with the replSetGetStatus command (rs.status() from the shell). The document returned by this command reflects the state and configuration of the replica set. Use this data to ensure that replication is properly configured, and to check the connections between the current host and the members of the replica set.
The dbStats data is accessible by way of the dbStats command (db.stats() from the shell). This command returns a document that contains data that reflects the amount of storage used and data contained in the database, as well as object, collection, and index counters. Use this data to check and track the state and storage of a specific database. This output also allows you to compare utilization between databases and to determine average document size in a database.
The collStats data is accessible using the collStats command (db.printCollectionStats() from the shell). It provides statistics that resemble dbStats on the collection level: this includes a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about the indexes.
In addition to status reporting, MongoDB provides a number of introspection tools that you can use to diagnose and analyze performance and operational conditions. Consider the following documentation:
A number of third party monitoring tools have support for MongoDB, either directly, or through their own plugins.
These are monitoring tools that you must install, configure and maintain on your own servers, usually open source.
| Tool | Plugin | Description |
|---|---|---|
| Ganglia | mongodb-ganglia | Python script to report operations per second, memory usage, btree statistics, master/slave status and current connections. |
| Ganglia | gmond_python_modules | Parses output from the serverStatus and replSetGetStatus commands. |
| Motop | None | Realtime monitoring tool for several MongoDB servers. Shows current operations ordered by durations every second. |
| mtop | None | A top like tool. |
| Munin | mongo-munin | Retrieves server statistics. |
| Munin | mongomon | Retrieves collection statistics (sizes, index sizes, and each (configured) collection count for one DB). |
| Munin | munin-plugins Ubuntu PPA | Some additional munin plugins not in the main distribution. |
| Nagios | nagios-plugin-mongodb | A simple Nagios check script, written in Python. |
| Zabbix | mikoomi-mongodb | Monitors availability, resource utilization, health, performance and other important metrics. |
Also consider dex, and index and query analyzing tool for MongoDB that compares MongoDB log files and indexes to make indexing recommendations.
These are monitoring tools provided as a hosted service, usually on a subscription billing basis.
| Name | Notes |
|---|---|
| Scout | Several plugins including: MongoDB Monitoring, MongoDB Slow Queries and MongoDB Replica Set Monitoring. |
| Server Density | Dashboard for MongoDB, MongoDB specific alerts, replication failover timeline and iPhone, iPad and Android mobile apps. |
During normal operation, mongod and mongos instances report information that reflect current operation to standard output, or a log file. The following runtime settings methods to control these options.
quiet. Limits the amount of information written to the log our output.
verbose. Increases the amount of information written to the log or output.
You can also specify this as v (as in -v.) Set multiple v, as in vvvv = True for higher levels of verbosity. You can also change the verbosity of a running mongod or mongos instance with the setParameter command.
logpath. Enables logging to a file, rather than standard output. Specify the full path to the log file to this setting..
logappend. Adds information to a log file instead of overwriting the file.
Note
You can specify these configuration operations as the command line arguments to mongod or mongos
Additionally, the following database commands
Degraded performance in MongoDB can be the result of an array of causes, and is typically a function of the relationship between the quantity of data stored in the database, the amount of system RAM, the number of connections to the database, and the amount of time the database spends in a lock state.
In some cases performance issues may be transient and related to traffic load, data access patterns, or the availability of hardware on the host system for virtualized environments. Some users also experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other situations, performance issues may indicate that the database may be operating at capacity and that it’s time to add additional capacity to the database.
MongoDB uses a locking system to ensure consistency; however, if certain operations are long-running, or a queue forms, performance slows as requests and operations wait for the lock. Because lock related slow downs can be intermittent, look to the data in the globalLock section of the serverStatus response to asses if the lock has been a challenge to your performance. If globalLock.currentQueue.total is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that might effect performance.
If globalLock.totalTime is high in context of uptime then the database has existed in a lock state for a significant amount of time. If globalLock.ratio is also high, MongoDB has likely been processing a large number of long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults and disk reads.
Because MongoDB uses memory mapped files to store data, given a data set of sufficient size, the MongoDB process will allocate all memory available on the system for its use. Because of the way operating systems function, the amount of allocated RAM is not a useful reflection of MongoDB’s state.
While this is part of the design, and affords MongoDB superior performance, the memory mapped files make it difficult to determine if the amount of RAM is sufficient for the data set. Consider memory usage statuses to better understand MongoDB’s memory utilization. Check the resident memory use (i.e. mem.resident:) if this exceeds the amount of system memory and there’s a significant amount of data on disk that isn’t in RAM, you may have exceeded the capacity of your system.
Also check the amount of mapped memory (i.e. mem.mapped.) If this value is greater than the amount of system memory, some operations will require disk access page faults to read data from virtual memory with deleterious effects on performance.
Page faults represent the number of times that MongoDB requires data not located in physical memory, and must read from virtual memory. To check for page faults, see the extra_info.page_faults value in the serverStatus command. This data is only available on Linux systems.
Alone, page faults are minor and complete quickly; however, in aggregate, large numbers of page fault typically indicate that MongoDB is reading too much data from disk and can indicate a number of underlying causes and recommendations. In many situations, MongoDB’s read locks will “yield” after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read into memory. This approach improves concurrency, and in high volume systems this also improves overall throughput.
If possible, increasing the amount of RAM accessible to MongoDB may help reduce the number of page faults. If this is not possible, you may want to consider deploying a sharded cluster and/or adding one or more shards to your deployment to distribute load among mongod instances.
In some cases, the number of connections between the application layer (i.e. clients) and the database can overwhelm the ability of the server to handle requests which can produce performance irregularities. Check the following fields in the serverStatus document:
Note
Unless limited by system-wide limits MongoDB has a hard connection limit of 20 thousand connections. You can modify system limits using the ulimit command, or by editing your system’s /etc/sysctl file.
If requests are high because there are many concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-heavy applications increase the size of your replica set and distribute read operations to secondary members. For write heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod instances.
Spikes in the number of connections can also be the result of application or driver errors. All of the MongoDB drivers supported by 10gen implement connection pooling, which allows clients to use and reuse connections more efficiently. Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or other configuration error.
MongoDB contains a database profiling system that can help identify inefficient queries and operations. Enable the profiler by setting the profile value using the following command in the mongo shell:
db.setProfilingLevel(1)
See
The documentation of db.setProfilingLevel() for more information about this command.
Note
Because the database profiler can have an impact on the performance, only enable profiling for strategic intervals and as minimally as possible on production systems.
You may enable profiling on a per-mongod basis. This setting will not propagate across a replica set or sharded cluster.
The following profiling levels are available:
| Level | Setting |
| 0 | Off. No profiling. |
| 1 | On. Only includes slow operations. |
| 2 | On. Includes all operations. |
See the output of the profiler in the system.profile collection of your database. You can specify the slowms to set a threshold above which the profiler considers operations “slow” and thus included in the level 1 profiling data. You may configure slowms at runtime, as an argument to the db.setProfilingLevel() operation.
Additionally, mongod records all “slow” queries to its log, as defined by slowms. The data in system.profile does not persist between mongod restarts.
You can view the profiler’s output by issuing the show profile command in the mongo shell, with the following operation.
db.system.profile.find( { millis : { $gt : 100 } } )
This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specified here (i.e. 100) is above the slowms threshold.
See also
Optimization Strategies for MongoDB Applications addresses strategies that may improve the performance of your database queries and operations.
The primary administrative concern that requires monitoring with replica sets, beyond the requirements for any MongoDB instance is “replication lag.” This refers to the amount of time that it takes a write operation on the primary to replicate to a secondary. Some very small delay period may be acceptable; however, as replication lag grows, two significant problems emerge:
For causes of replication lag, see Replication Lag.
Replication issues are most often the result of network connectivity issues between members or the result of a primary that does not have the resources to support application and replication traffic. To check the status of a replica, use the replSetGetStatus or the following helper in the shell:
rs.status()
See the Replica Set Status Reference document for a more in depth overview view of this output. In general watch the value of optimeDate. Pay particular attention to the difference in time between the primary and the secondary members.
The size of the operation log is only configurable during the first run using the --oplogSize argument to the mongod command, or preferably the oplogSize in the MongoDB configuration file. If you do not specify this on the command line before running with the --replSet option, mongod will create an default sized oplog.
By default the oplog is 5% of total available disk space on 64-bit systems.
See also
In most cases the components of sharded clusters benefit from the same monitoring and analysis as all other MongoDB instances. Additionally, clusters require monitoring to ensure that data is effectively distributed among nodes and that sharding operations are functioning appropriately.
See also
See the Sharding wiki page for more information.
The config database provides a map of documents to shards. The cluster updates this map as chunks move between shards. When a configuration server becomes inaccessible, some sharding operations like moving chunks and starting mongos instances become unavailable. However, clusters remain accessible from already-running mongos instances.
Because inaccessible configuration servers can have a serious impact on the availability of a sharded cluster, you should monitor the configuration servers to ensure that the cluster remains well balanced and that mongos instances can restart.
The most effective sharded cluster deployments require that chunks are evenly balanced between the shards. MongoDB has a background balancer process that distributes data such that chunks are always optimally distributed among the shards. Issue the db.printShardingStatus() or sh.status() command to the mongos by way of the mongo shell. This returns an overview of the entire cluster including the database name, and a list of the chunks.
In nearly every case, all locks used by the balancer are automatically released when they become stale. However, because any long lasting lock can block future balancing, it’s important to insure that all locks are legitimate. To check the lock status of the database, connect to a mongos instance using the mongo shell. Issue the following command sequence to switch to the config database and display all outstanding locks on the shard database:
use config
db.locks.find()
For active deployments, the above query might return a useful result set. The balancing process, which originates on a randomly selected mongos, takes a special “balancer” lock that prevents other balancing activity from transpiring. Use the following command, also to the config database, to check the status of the “balancer” lock.
db.locks.find( { _id : "balancer" } )
If this lock exists, make sure that the balancer process is actively using this lock.
Full database instance backups are useful for disaster recovery protection and routine database backup operation; however, some cases require additional import and export functionality.
This document provides an overview of the import and export programs included in the MongoDB distribution. These tools are useful when you want to backup or export a portion of your data without capturing the state of the entire database, or for simple data ingestion cases. For more complex data migration tasks, you may want to write your own import and export scripts using a client driver to interact with the database itself.
Warning
Because these tools primarily operate by interacting with a running mongod instance, they can impact the performance of your running database.
Not only do these backup processes create traffic for a running database instance, they also force the database to read all data through memory. When MongoDB reads infrequently used data, it can supplant more frequently accessed data, causing a deterioration in performance for the database’s regular workload.
mongoimport and mongoexport do not reliably preserve all rich BSON data types, because BSON is a superset of JSON. Thus, mongoimport and mongoexport cannot represent BSON data accurately in JSON. As a result data exported or imported with these tools may lose some measure of fidelity. See the “MongoDB Extended JSON” wiki page for more information about Use with care.
See also
See the “Backup and Restoration Strategies” document for more information on backing up MongoDB instances. Additionally, consider the following references for commands addressed in this document:
If you want to transform and process data once you’ve imported it in MongoDB consider the topics in Aggregation, including:
JSON does not have the following data types that exist in BSON documents: data_binary, data_date, data_timestamp, data_regex, data_oid and data_ref. As a result using any tool that decodes BSON documents into JSON will suffer some loss of fidelity.
If maintaining type fidelity is important, consider writing a data import and export system that does not force BSON documents into JSON form as part of the process. The following list of types contain examples for how MongoDB will represent how BSON documents render in JSON.
data_binary
{ "$binary" : "<bindata>", "$type" : "<t>" }
<bindata> is the base64 representation of a binary string. <t> is the hexadecimal representation of a single byte indicating the data type.
data_date
Date( <date> )
<date> is the JSON representation of a 64-bit signed integer for milliseconds since epoch.
data_timestamp
Timestamp( <t>, <i> )
<t> is the JSON representation of a 32-bit unsigned integer for milliseconds since epoch. <i> is a 32-bit unsigned integer for the increment.
data_regex
/<jRegex>/<jOptions>
<jRegex> is a string that may contain valid JSON characters and unescaped double quote (i.e. ") characters, but may not contain unescaped forward slash (i.e. /) characters. <jOptions> is a string that may contain only the characters g, i, m, and s.
data_oid
ObjectId( "<id>" )
<id> is a 24 character hexadecimal string. These representations require that data_oid values have an associated field named “_id.”
data_ref
DBRef( "<name>", "<id>" )
<name> is a string of valid JSON characters. <id> is a 24 character hexadecimal string.
See also
“MongoDB Extended JSON” wiki page.
For resilient and non-disruptive backups, use a file system or block-level disk snapshot function, such as the methods described in the “Backup and Restoration Strategies” document. The tools and operations discussed provide functionality that’s useful in the context of providing some kinds of backups.
By contrast, use import and export tools to backup a small subset of your data or to move data to or from a 3rd party system. These backups may capture a small crucial set of data or a frequently modified section of data, for extra insurance, or for ease of access. No matter how you decide to import or export your data, consider the following guidelines:
This section describes a process to import/export your database, or a portion thereof, to a file in a JSON or CSV format.
See also
The mongoimport and mongoexport documents contain complete documentation of these tools. If you have questions about the function and parameters of these tools not covered here, please refer to these documents.
If you want to simply copy a database or collection from one instance to another, consider using the copydb, clone, or cloneCollection commands, which may be more suited to this task. The mongo shell provides the db.copyDatabase() method.
These tools may also be useful for importing data into a MongoDB database from third party applications.
With the mongoexport utility you can create a backup file. In the most simple invocation, the command takes the following form:
mongoexport --collection collection --out collection.json
This will export all documents in the collection named collection into the file collection.json. Without the output specification (i.e. “--out collection.json”,) mongoexport writes output to standard output (i.e. “stdout.”) You can further narrow the results by supplying a query filter using the “--query” and limit results to a single database using the “--db” option. For instance:
mongoexport --db sales --collection contacts --query '{"field": 1}'
This command returns all documents in the sales database’s contacts collection, with a field named field with a value of 1. Enclose the query in single quotes (e.g. ') to ensure that it does not interact with your shell environment. The resulting documents will return on standard output.
By default, mongoexport returns one JSON document per MongoDB document. Specify the “--jsonArray” argument to return the export as a single JSON array. Use the “--csv” file to return the result in CSV (comma separated values) format.
If your mongod instance is not running, you can use the “--dbpath” option to specify the location to your MongoDB instance’s database files. See the following example:
mongoexport --db sales --collection contacts --dbpath /srv/MongoDB/
This reads the data files directly. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongoexport in this configuration.
The “--host” and “--port” options allow you to specify a non-local host to connect to capture the export. Consider the following example:
mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection contacts --file mdb1-examplenet.json
On any mongoexport command you may, as above specify username and password credentials as above.
To restore a backup taken with mongoexport. Most of the arguments to mongoexport also exist for mongoimport. Consider the following command:
mongoimport --collection collection --file collection.json
This imports the contents of the file collection.json into the collection named collection. If you do not specify a file with the “--file” option, mongoimport accepts input over standard input (e.g. “stdin.”)
If you specify the “--upsert” option, all of mongoimport operations will attempt to update existing documents in the database and insert other documents. This option will cause some performance impact depending on your configuration.
You can specify the database option “--db” to import these documents to a particular database. If your MongoDB instance is not running, use the “--dbpath” option to specify the location of your MongoDB instance’s database files. Consider using the “--journal” option to ensure that mongoimport records its operations in the journal. The mongod process must not be running or attached to these data files when you run mongoimport in this configuration.
Use the “--ignoreBlanks” option to ignore blank fields. For CSV and TSV imports, this option provides the desired functionality in most cases: it avoids inserting blank fields in MongoDB documents.
This document provides an inventory of database backup strategies for use with MongoDB. This document contains the following sections:
Production systems should always have some consideration and strategy for taking and restoring backups. The goal of a backup strategy is to produce a full and consistent copy of the data that you can use to bring up a new or replacement database instance. However, in some cases, taking backups is difficult or impossible, given large data volumes, distributed architectures, and data transmission speeds.
Nevertheless, with MongoDB, there are two major approaches to backups: [1]
Using system-level tools, like disk image snapshots.
Using various capacities present in the mongodump tool.
The methods described in this document operate by copying the data file on the disk level. If your system does not provide functionality for this kind of backup, see the section on Using Binary Database Dumps for Backups.
Ensuring that the state captured by the backup is consistent and usable is the primary challenge of producing backups of database systems. Backups that you cannot produce reliably, or restore from feasibly are worthless.
As you develop your backup system, take into consideration the specific features of your deployment, your use patterns, and your architecture.
Because every environment is unique it’s important to regularly test the backups that you capture to ensure that your backup system is practically, and not just theoretically, functional.
| [1] | In many situations increasing the amount of replication provides useful assurances with data sets that are difficult to restore in a timely manner. |
When evaluating a backup strategy for your MongoDB deployment consider the following factors:
With this information in hand you can begin to develop a backup plan for your database. Remember that all backup plans must be:
This section provides an overview of using disk/block level snapshots (i.e. LVM or storage appliance) to backup a MongoDB instance. These tools make a quick block-level backup of the device that holds MongoDB’s data files. These methods complete quickly, work reliably, and typically provide the easiest backup systems method to implement.
Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to “hard links.” As the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result the snapshot only stores modified data.
After making the snapshot, you mount the snapshot image on your file system and copy data from the snapshot. The resulting backup contains a full copy of all data.
Snapshots have the following limitations:
The database must be in a consistent or recoverable state when the snapshot takes place. This means that all writes accepted by the database need to be fully written to disk: either to the journal or to data files.
If all writes are not on disk when the backup occurs, the backup will not reflect these changes. If writes are in progress when the backup occurs, the data files will reflect an inconsistent state. With journaling all data-file states resulting from in-progress writes are recoverable; without journaling you must flush all pending writes to disk before running the backup operation and must ensure that no writes occur during the entire backup procedure.
If you do use journaling, the journal must reside on the same volume as the data.
Snapshots create an image of an entire disk image. Unless you need to back up your entire system, consider isolating your MongoDB data files, journal (if applicable), and configuration on one logical disk that doesn’t contain any other data.
Alternately, store all MongoDB data files on a dedicated device so that you can make backups without duplicating extraneous data.
Ensure that you copy data from snapshots and onto other systems to ensure that data is safe from site failures.
If your system has snapshot capability and your mongod instance has journaling enabled, then you can use any kind of file system or volume/block level snapshot tool to create backups.
Warning
Changed in version 1.9.2.
Journaling is only enabled by default on 64-bit builds of MongoDB.
To enable journaling on all other builds, specify journal = true in the configuration or use the --journal run-time option for mongod.
Many service providers provide a block-level backup service based on disk image snapshots. If you manage your own infrastructure on a Linux-based system, configure your system with LVM to provide your disk packages and provide snapshot capability. You can also use LVM-based setups within a cloud/virtualized environment.
Note
Running LVM provides additional flexibility and enables the possibility of using snapshots to back up MongoDB.
If you use Amazon’s EBS service in a software RAID 10 configuration, use LVM to capture a consistent disk image. Also, see the special considerations described in Amazon EBS in Software RAID 10 Configuration.
The following sections provide an overview of a simple backup process using LVM on a Linux system. While the tools, commands, and paths may be (slightly) different on your system the following steps provide a high level overview of the backup operation.
To create a snapshot with LVM, issue a command, as root, in the following format:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb
This command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodb volume in the vg0 volume group.
This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating system’s LVM configuration.
The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reflect the total amount of the data on the disk, but rather the quantity of differences between the current state of /dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.)
Warning
Ensure that you create snapshots with enough space to account for data growth, particularly for the period of time that it takes to copy data out of the system or to a temporary image.
If your snapshot runs out of space, the snapshot image becomes unusable. Discard this logical volume and create another.
The snapshot will exist when the command returns. You can restore directly from the snapshot at any time or by creating a new logical volume and restoring from this snapshot to the alternate image.
While snapshots are great for creating high quality backups very quickly, they are not ideal as a format for storing backup data. Snapshots typically depend and reside on the same storage infrastructure as the original disk images. Therefore, it’s crucial that you archive these snapshots and store them elsewhere.
After creating a snapshot, mount the snapshot and move the data to separate storage. Your system might try to compress the backup images as you move the offline. Consider the following procedure to fully archive the data from the snapshot:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | tar -czf mdb-snap01.tar.gz
The above command sequence:
Ensures that the /dev/vg0/mdb-snap01 device is not mounted.
Does a block level copy of the entire snapshot image using the dd command, and compresses the result in a gzipped tar archive in the current working directory.
Warning
This command will create a large tar.gz file in your current working directory. Make sure that you run this command in a file system that has enough free space.
To restore a backup created with the above method, use the following procedure:
lvcreate --size 1G --name mdb-new vg0
tar -xzf mdb-snap01.tar.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
The above sequence:
Creates a new logical volume named mdb-new, in the /dev/vg0 volume group. The path to the new device will be /dev/vg0/mdb-new.
Warning
This volume will have a maximum size of 1 gigabyte. The original file system must have had a total size of 1 gigabyte or smaller, or else the restoration will fail.
Change 1G to your desired volume size.
Uncompresses and unarchives the mdb-snap01.tar.gz into the mdb-new disk image.
Mounts the mdb-new disk image to the /srv/mongodb directory. Modify the mount point to correspond to your MongoDB data file location, or other location as needed.
To restore a backup without writing to a compressed tar archive, use the following sequence:
umount /dev/vg0/mdb-snap01
lvcreate --size 1G --name mdb-new vg0
dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
You can implement off-system backups using the combined process and SSH.
This sequence is identical to procedures explained above, except that it archives and compresses the backup on a remote system using SSH.
Consider the following procedure:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | ssh username@example.com tar -czf /opt/backup/mdb-snap01.tar.gz
lvcreate --size 1G --name mdb-new vg0
ssh username@example.com tar -xzf /opt/backup/mdb-snap01.tar.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
If your mongod instance does not run with journaling enabled, or if your journal is on a separate volume, obtaining a functional backup of a consistent state is more complicated. As described in this section, you must flush all writes to disk and lock the database to prevent writes during the backup process. If you have a replica set configuration, then for your backup use a secondary which is not receiving reads (i.e. hidden member).
To flush writes to disk and to “lock” the database (to prevent further writes), issue the db.fsyncLock() method in the mongo shell:
db.fsyncLock();
Perform the backup operation described in Create Snapshot.
To unlock the database after the snapshot has completed, use the following command in the mongo shell:
db.fsyncUnlock();
Note
Changed in version 2.0: MongoDB 2.0 added db.fsyncLock() and db.fsyncUnlock() helpers to the mongo shell. Prior to this version, use the fsync command with the lock option, as follows:
db.runCommand( { fsync: 1, lock: true } );
db.runCommand( { fsync: 1, lock: false } );
Note
The database cannot be locked with db.fsyncLock() while profiling is enabled. You must disable profiling before locking the database with db.fsyncLock(). Disable profiling using db.setProfilingLevel() as follows in the mongo shell:
db.setProfilingLevel(0)
Warning
Changed in version 2.2: When used in combination with fsync or db.fsyncLock(), mongod may block some reads, including those from mongodump, when queued write operation waits behind the fsync lock.
If your deployment depends on Amazon’s Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform’s snapshot tool. As a result you may:
Flush all writes to disk and create a write lock to ensure consistent state during the backup process.
If you choose this option see Backup Without Journaling.
Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.
If you choose this option, perform the LVM backup operation described in Create Snapshot.
This section describes the process for writing the entire contents of your MongoDB instance to a file in a binary format. If disk-level snapshots are not available, this approach provides the best option for full system database backups.
See also
The mongodump and mongorestore documents contain complete documentation of these tools. If you have questions about these tools not covered here, please refer to these documents.
If your system has disk level snapshot capabilities, consider the backup methods described in Using Block Level Backup Methods.
The mongodump utility can perform a live backup of data or can work against an inactive set of database files. The mongodump utility can create a dump for an entire server/database/collection (or part of a collection using of query), even when the database is running and active. If you run mongodump without any arguments, the command connects to the local database instance (e.g. 127.0.0.1 or localhost) and creates a database backup named dump/ in the current directory.
Note
The format of data created by mongodump tool from the 2.2 distribution or later is different and incompatible with earlier versions of mongod.
To limit the amount of data included in the database dump, you can specify --database and --collection as options to the mongodump command. For example:
mongodump --collection collection --db test
This command creates a dump of the collection named collection from the database test in a dump/ subdirectory of the current working directory.
Use the --oplog option with mongodump to collect the oplog entries to build a point-in-time snapshot of a database within a replica set. With --oplog, mongodump copies all the data from the source database as well as all of the oplog entries from the beginning of the backup procedure to until the backup procedure completes. This backup procedure, in conjunction with mongorestore --oplogReplay, allows you to restore a backup that reflects a consistent and specific moment in time.
If your MongoDB instance is not running, you can use the --dbpath option to specify the location to your MongoDB instance’s database files. mongodump reads from the data files directly with this operation. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongodump in this configuration. Consider the following example:
mongodump --dbpath /srv/mongodb
Additionally, the --host and --port options allow you to specify a non-local host to connect to capture the dump. Consider the following example:
mongodump --host mongodb1.example.net --port 3017 --username user --password pass --out /opt/backup/mongodump-2011-10-24
On any mongodump command you may, as above, specify username and password credentials to specify database authentication.
The mongorestore utility restores a binary backup created by mongodump. Consider the following example command:
mongorestore dump-2011-10-25/
Here, mongorestore imports the database backup located in the dump-2011-10-25 directory to the mongod instance running on the localhost interface. By default, mongorestore looks for a database dump in the dump/ directory and restores that. If you wish to restore to a non-default host, the --host and --port options allow you to specify a non-local host to connect to capture the dump. Consider the following example:
mongorestore --host mongodb1.example.net --port 3017 --username user --password pass /opt/backup/mongodump-2011-10-24
On any mongorestore command you may specify username and password credentials, as above.
If you created your database dump using the --oplog option to ensure a point-in-time snapshot, call mongorestore with the --oplogReplay option, as in the following example:
mongorestore --oplogReplay
You may also consider using the mongorestore --objcheck option to check the integrity of objects while inserting them into the database, or you may consider the mongorestore --drop option to drop each collection from the database before restoring from backups. mongorestore also includes the ability to a filter to all input before inserting it into the new database. Consider the following example:
mongorestore --filter '{"field": 1}'
Here, mongorestore only adds documents to the database from the dump located in the dump/ folder if the documents have a field name field that holds a value of 1. Enclose the filter in single quotes (e.g. ') to prevent the filter from interacting with your shell environment.
mongorestore --dbpath /srv/mongodb --journal
Here, mongorestore restores the database dump located in dump/ folder into the data files located at /srv/mongodb. Additionally, the --journal option ensures that mongorestore records all operation in the durability journal. The journal prevents data file corruption if anything (e.g. power failure, disk failure, etc.) interrupts the restore operation.
See also
mongodump and mongorestore.
The underlying architecture of sharded clusters and replica sets presents several challenges for creating backups. This section describes how to make quality backups in environments with these configurations and how to perform restorations.
Important
To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a running production system, you can only capture an approximation of point-in-time snapshot.
Sharded clusters complicate backup operations, as distributed systems. True point-in-time backups are only possible when stopping all write activity from the application. To create a precise moment-in-time snapshot of a cluster, stop all application write activity to the database, capture a backup, and only allow write operations to the database after the backup is complete.
However, you can capture a backup of a cluster that approximates a point-in-time backup by capturing a backup from a secondary member of the replica sets that provide the shards in the cluster at roughly the same moment. If you decide to use an approximate-point-in-time backup method, ensure that your application can operate using a copy of the data that does not reflect a single moment in time.
The following documents describe all sharded cluster related backup procedures:
In most cases, backing up data stored in a replica set is similar to backing up data stored in a single instance. It’s possible to lock a single secondary or slave database and then on create a backup from that instance. When you unlock the database, the secondary or slave will catch up with the primary or master. You may also chose to deploy a dedicated hidden member for backup purposes.
If you have a sharded cluster where each shard is itself a replica set, you can use this method to create a backup of the entire cluster without disrupting the operation of the node. In these situations you should still turn off the balancer when you create backups.
For any cluster, using a non-primary/non-master node to create backups is particularly advantageous in that the backup operation does not affect the performance of the primary or master. Replication itself provides some measure of redundancy. Nevertheless, keeping point-in time backups of your cluster to provide for disaster recovery and as an additional layer of protection is crucial.
The Linux kernel provides a system to limit and control the number of threads, connections, and open files on a per-process and per-user basis. These limits prevent single users from using too many system resources. Sometimes, these limits, as configured by the distribution developers, are too low for MongoDB and can cause a number of issues in the course of normal MongoDB operation. Generally, MongoDB should be the only user process on a system, to prevent resource contention.
mongod and mongos each use threads and file descriptors to track connections and manage internal operations. This section outlines the general resource utilization patterns for MongoDB. Use these figures in combination with the actual information about your deployment and its use to determine ideal ulimit settings.
Generally, all mongod and mongos instances, like other processes:
mongod uses background threads for a number of internal processes, including TTL collections, replication, and replica set health checks, which may require a small number of additional resources.
In addition to the threads and file descriptors for client connections, mongos must maintain connects to all config servers and all shards, which includes all members of all replica sets.
For mongos, consider the following behaviors:
mongos instances maintain a connection pool to each shard so that the mongos can reuse connections and quickly fulfill requests without needing to create new connections.
You can limit the number of incoming connections using the maxConns run-time option:
:option:`--maxConns <mongos --maxConns>`
By restricting the number of incoming connections you can prevent a cascade effect where the mongos creates too many connections on the mongod instances.
Note
You cannot set maxConns to a value higher than 20000.
You can use the ulimit command at the system prompt to check system limits, as in the following example:
$ ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 192276
-n: file descriptors 21000
-l: locked-in-memory size (kb) 40000
-v: address space (kb) unlimited
-x: file locks unlimited
-i: pending signals 192276
-q: bytes in POSIX msg queues 819200
-e: max nice 30
-r: max rt priority 65
-N 15: unlimited
ulimit refers to the per-user limitations for various resources. Therefore, if your mongod instance executes as a user that is also running multiple processes, or multiple mongod processes, you might see contention for these resources. Also, be aware that the processes value (i.e. -u) refers to the combined number of distinct processes and sub-process threads.
You can change ulimit settings by issuing a command in the following form:
ulimit -n <value>
For many distributions of Linux you can change values by substituting the -n option for any possible value in the output of ulimit -a. See your operating system documentation for the precise procedure for changing system limits on running systems.
Note
After changing the ulimit settings, you must restart the process to take advantage of the modified settings. You can use the /proc file system to see the current limitations on a running process.
Depending on your system’s configuration, and default settings, any change to system limits made using ulimit may revert following system a system restart. Check your distribution and operating system documentation for more information.
Note
This section applies only to Linux operating systems.
The /proc file-system stores the per-process limits in the file system object located at /proc/<pid>/limits, where <pid> is the process’s PID or process identifier. You can use the following bash function to return the content of the limits object for a process or processes with a given name:
return-limits(){
for process in $@; do
process_pids=`ps -C $process -o pid --no-headers | cut -d " " -f 2`
if [ -z $@ ]; then
echo "[no $process running]"
else
for pid in $process_pids; do
echo "[$process #$pid -- limits]"
cat /proc/$pid/limits
done
fi
done
}
You can copy and paste this function into a current shell session or load it as part of a script. Call the function with one the following invocations:
return-limits mongod
return-limits mongos
return-limits mongod mongos
The output of the first command may resemble the following:
[mongod #6809 -- limits]
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8720000 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 192276 192276 processes
Max open files 1024 4096 files
Max locked memory 40960000 40960000 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 192276 192276 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 30 30
Max realtime priority 65 65
Max realtime timeout unlimited unlimited us
Every deployment may have unique requirements and settings; however, the following thresholds and settings are particularly important for mongod and mongos deployments:
Always remember to restart your mongod and mongos instances after changing the ulimit settings to make sure that the settings change takes effect.
| [1] | If you limit the resident memory size on a system running MongoDB you risk allowing the operating system to terminate the mongod process under normal situations. Do not set this value. If the operating system (i.e. Linux) kills your mongod, with the OOM killer, check the output of serverStatus and ensure MongoDB is not leaking memory. |
This page details system configurations that affect MongoDB, especially in production.
To make backups of your MongoDB database, please refer to the backups section.
Always run MongoDB in a trusted environment, with network rules that prevent access from all unknown machines, systems, or networks. As with any sensitive system dependent on network access, your MongoDB deployment should only be accessible to specific systems that require access: application servers, monitoring services, and other MongoDB components.
See documents in the Security section for additional information, specifically:
If you use the Linux kernel, the MongoDB user community has recommended Linux kernel 2.6.36 or later for running MongoDB in production.
Because MongoDB preallocates its database files before using them and because MongoDB uses very large files on average, your should use the Ext4 and XFS file systems if using the Linux kernel:
For MongoDB on Linux use the following recommended configurations:
For random access use patterns set readahead values low, for example setting readahead to a small value such as 32 (16KB) often works well.
The section describes considerations when running MongoDB in some of the more common virtual environments.
MongoDB is compatible with EC2 and requires no configuration changes specific to the environment.
MongoDB is compatible with VMWare. Some in the MongoDB community have run into issues with the VMWare’s memory overcommit feature and suggest disabling the feature.
You can clone a virtual machine running MongoDB. You might use this to spin up a new virtual host that will be added as a member of a replica set. If Journaling is enabled, the clone snapshot will be consistent. If not using journaling, stop mongod, clone, and then restart.
The MongoDB community has encountered issues running MongoDB on OpenVZ.
Configure swap space for your systems. Having swap can prevent issues with memory contention and can prevent the OOM Killer on Linux systems from killing mongod. Because of the way mongod maps memory files to memory, the operating system will never store MongoDB data in swap.
Most MongoDB deployments should use disks backed by RAID-10.
RAID-5 and RAID-6 do not typically provide sufficient performance to support a MongoDB deployment.
RAID-0 provides good write performance but provides limited availability, and reduced performance on read operations, particularly using Amazon’s EBS volumes: as a result, avoid RAID-0 with MongoDB deployments.
Some versions of NFS perform very poorly with MongoDB and NFS is not recommended for use with MongoDB.
Many MongoDB deployments work successfully with Amazon’s Elastic Block Store (EBS) volumes. There are certain intrinsic performance characteristics, with EBS volumes that users should consider.
MongoDB is designed specifically with commodity hardware in mind and has few hardware requirements or limitations. MongoDB core components runs on little-endian hardware primarily x86/x86_64 processors. Client libraries (i.e. drivers) can run on big or little endian systems.
When installing hardware for MongoDB, consider the following:
MongoDB and NUMA, Non-Uniform Access Memory, do not work well together. When running MongoDB on NUMA hardware, disable NUMA for MongoDB and running with an interleave memory policy. NUMA can cause a number of operational problems with MongoDB, including slow performance for periods of time or high system processor usage.
Note
On Linux, MongoDB version 2.0 and greater checks these settings on start up and prints a warning if the system is NUMA-based.
To disable NUMA for MongoDB, use the numactl command and start mongod in the following manner:
numactl --interleave=all /usr/bin/local/mongod
Adjust the proc settings using the following command:
echo 0 > /proc/sys/vm/zone_reclaim_mode
To fully disable NUMA you must perform both operations. However, can change zone_reclaim_mode without restarting mongod. For more information, see documentation on Proc/sys/vm.
See the The MySQL “swap insanity” problem and the effects of NUMA post, which describes the effects of NUMA on databases. This blog post addresses the impact of NUMA for MySQL; however, the issues for MongoDB are similar. The post introduces NUMA its goals, and illustrates how these goals are not compatible with production databases.
On Linux, use the iostat command to check if disk I/O is a bottleneck for your database. Specify a number of seconds when running iostat to avoid displaying stats covering the time since server boot.
For example:
iostat -xm 2
Use the mount command to see what device your data directory resides on.
Key fields from iostat:
The following tutorials describe basic administrative procedures for MongoDB deployments:
If MongoDB does not shutdown cleanly [1] the on-disk representation of the data files will likely reflect an inconsistent state which could lead to data corruption. [2]
To prevent data inconsistency and corruption, always shut down the database cleanly and use the durability journaling. The journal writes data to disk every 100 milliseconds by default and ensures that MongoDB can recover to a consistent state even in the case of an unclean shutdown due to power loss or other system failure.
If you are not running as part of a replica set and do not have journaling enabled, use the following procedure to recover data that may be in an inconsistent state. If you are running as part of a replica set, you should always restore from a backup or restart the mongod instance with an empty dbpath and allow MongoDB to resync the data.
See also
The Administration documents, including Replica Set Syncing, and the documentation on the repair, repairpath, and journal settings.
| [1] | To ensure a clean shut down, use the mongod --shutdown option, your control script, “Control-C” (when running mongod in interactive mode,) or kill $(pidof mongod) or kill -2 $(pidof mongod). |
| [2] | You can also use the db.collection.validate() method to test the integrity of a single collection. However, this process is time consuming, and without journaling you can safely assume that the data is in an invalid state and you should either run the repair operation or resync from an intact member of the replica set. |
When you are aware of a mongod instance running without journaling that stops unexpectedly and you’re not running with replication, you should always run the repair operation before starting MongoDB again. If you’re using replication, then restore from a backup and allow replication to synchronize your data.
If the mongod.lock file in the data directory specified by dbpath, /data/db by default, is not a zero-byte file, then mongod will refuse to start, and you will find a message that contains the following line in your MongoDB log our output:
Unclean shutdown detected.
This indicates that you need to remove the lockfile and run repair. If you run repair when the mongodb.lock file exists without the mongod --repairpath option, you will see a message that contains the following line:
old lock file: /data/db/mongod.lock. probably means unclean shutdown
You must remove the lockfile and run the repair operation before starting the database normally using the following procedure:
Warning
Recovering a member of a replica set.
Do not use this procedure to recover a member of a replica set. Instead you should either restore from a backup or resync from an intact member of the set, as described in Resyncing a Member of a Replica Set.
There are two processes to repair data files that result from an unexpected shutdown:
Use the --repair option in conjunction with the --repairpath option. mongod will read the existing data files, and write the existing data to new data files. This does not modify or alter the existing data files.
You do not need to remove the mongod.lock file before using this procedure.
Use the --repair option. mongod will read the existing data files, write the existing data to new files and replace the existing, possibly corrupt, files with new files.
You must remove the mongod.lock file before using this procedure.
Note
--repair functionality is also available in the shell with the db.repairDatabase() helper for the repairDatabase command.
To repair your data files using the --repairpath option to preserve the original data files unmodified:
Start mongod using --repair to read the existing data files.
mongod --dbpath /data/db --repair --repairpath /data/db0
When this completes, the new repaired data files will be in the /data/db0 directory.
Start mongod using the following invocation to point the dbpath at /data/db2:
mongod --dbpath /data/db0
Once you confirm that the data files are operational you may delete or archive the data files in the /data/db directory.
To repair your data files without preserving the original files, do not use the --repairpath option, as in the following procedure:
Remove the stale lock file:
rm /data/db/mongod.lock
Replace /data/db with your dbpath where your MongoDB instance’s data files reside.
Warning
After you remove the mongod.lock file you must run the --repair process before using your database.
Start mongod using --repair to read the existing data files.
mongod --dbpath /data/db --repair
When this completes, the repaired data files will replace the original data files in the /data/db directory.
Start mongod using the following invocation to point the dbpath at /data/db:
mongod --dbpath /data/db
In normal operation, you should never remove the mongod.lock file and start mongod. Instead use one of the above methods to recover the database and remove the lock files. In dire situations you can remove the lockfile, and start the database using the possibly corrupt files, and attempt to recover data from the database; however, it’s impossible to predict the state of the database in these situations.
If you are not running with journaling, and your database shuts down unexpectedly for any reason, you should always proceed as if your database is in an inconsistent and likely corrupt state. If at all possible restore from backup or if running as a replica set resync from an intact member of the set, as described in Resyncing a Member of a Replica Set.
Following this tutorial, you will convert a single 3-member replica set to a cluster that consists of 2 shards. Each shard will consist of an independent 3-member replica set.
The tutorial uses a test environment running on a local system UNIX-like system. You should feel encouraged to “follow along at home.” If you need to perform this process in a production environment, notes throughout the document indicate procedural differences.
The procedure, from a high level, is as follows:
Install MongoDB according to the instructions in the MongoDB Installation Tutorial.
If have an existing MongoDB replica set deployment, you can omit the this step and continue from Deploy Sharding Infrastructure.
Use the following sequence of steps to configure and deploy a replica set and to insert test data.
Create the following directories for the first replica set instance, named firstset:
To create directories, issue the following command:
mkdir -p /data/example/firstset1 /data/example/firstset2 /data/example/firstset3
In a separate terminal window or GNU Screen window, start three mongod instances by running each of the following commands:
mongod --dbpath /data/example/firstset1 --port 10001 --replSet firstset --oplogSize 700 --rest
mongod --dbpath /data/example/firstset2 --port 10002 --replSet firstset --oplogSize 700 --rest
mongod --dbpath /data/example/firstset3 --port 10003 --replSet firstset --oplogSize 700 --rest
Note
The --oplogSize 700 option restricts the size of the operation log (i.e. oplog) for each mongod instance to 700MB. Without the --oplogSize option, each mongod reserves approximately 5% of the free disk space on the volume. By limiting the size of the oplog, each instance starts more quickly. Omit this setting in production environments.
In a mongo shell session in a new terminal, connect to the mongodb instance on port 10001 by running the following command. If you are in a production environment, first read the note below.
mongo localhost:10001/admin
Note
Above and hereafter, if you are running in a production environment or are testing this process with mongod instances on multiple systems, replace “localhost” with a resolvable domain, hostname, or the IP address of your system.
In the mongo shell, initialize the first replica set by issuing the following command:
db.runCommand({"replSetInitiate" :
{"_id" : "firstset", "members" : [{"_id" : 1, "host" : "localhost:10001"},
{"_id" : 2, "host" : "localhost:10002"},
{"_id" : 3, "host" : "localhost:10003"}
]}})
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
In the mongo shell, create and populate a new collection by issuing the following sequence of JavaScript operations:
use test
switched to db test
people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina", "Katie", "Jeff"];
for(var i=0; i<1000000; i++){
name = people[Math.floor(Math.random()*people.length)];
user_id = i;
boolean = [true, false][Math.floor(Math.random()*2)];
added_at = new Date();
number = Math.floor(Math.random()*10001);
db.test_collection.save({"name":name, "user_id":user_id, "boolean": boolean, "added_at":added_at, "number":number });
}
The above operations add one million documents to the collection test_collection. This can take several minutes, depending on your system.
The script adds the documents in the following form:
{ "_id" : ObjectId("4ed5420b8fc1dd1df5886f70"), "name" : "Greg", "user_id" : 4, "boolean" : true, "added_at" : ISODate("2011-11-29T20:35:23.121Z"), "number" : 74 }
This procedure creates the three config databases that store the cluster’s metadata.
Note
For development and testing environments, a single config database is sufficient. In production environments, use three config databases. Because config instances store only the metadata for the sharded cluster, they have minimal resource requirements.
Create the following data directories for three config database instances:
Issue the following command at the system prompt:
mkdir -p /data/example/config1 /data/example/config2 /data/example/config3
In a separate terminal window or GNU Screen window, start the config databases by running the following commands:
mongod --configsvr --dbpath /data/example/config1 --port 20001
mongod --configsvr --dbpath /data/example/config2 --port 20002
mongod --configsvr --dbpath /data/example/config3 --port 20003
In a separate terminal window or GNU Screen window, start mongos instance by running the following command:
mongos --configdb localhost:20001,localhost:20002,localhost:20003 --port 27017 --chunkSize 1
Note
If you are using the collection created earlier or are just experimenting with sharding, you can use a small --chunkSize (1MB works well.) The default chunkSize of 64MB means that your cluster must have 64MB of data before the MongoDB’s automatic sharding begins working.
In production environments, do not use a small shard size.
The configdb options specify the configuration databases (e.g. localhost:20001, localhost:20002, and localhost:2003). The mongos instance runs on the default “MongoDB” port (i.e. 27017), while the databases themselves are running on ports in the 30001 series. In the this example, you may omit the --port 27017 option, as 27017 is the default port.
Add the first shard in mongos. In a new terminal window or GNU Screen session, add the first shard, according to the following procedure:
Connect to the mongos with the following command:
mongo localhost:27017/admin
Add the first shard to the cluster by issuing the addShard command:
db.runCommand( { addShard : "firstset/localhost:10001,localhost:10002,localhost:10003" } )
Observe the following message, which denotes success:
{ "shardAdded" : "firstset", "ok" : 1 }
This procedure deploys a second replica set. This closely mirrors the process used to establish the first replica set above, omitting the test data.
Create the following data directories for the members of the second replica set, named secondset:
In three new terminal windows, start three instances of mongod with the following commands:
mongod --dbpath /data/example/secondset1 --port 10004 --replSet secondset --oplogSize 700 --rest
mongod --dbpath /data/example/secondset2 --port 10005 --replSet secondset --oplogSize 700 --rest
mongod --dbpath /data/example/secondset3 --port 10006 --replSet secondset --oplogSize 700 --rest
Note
As above, the second replica set uses the smaller oplogSize configuration. Omit this setting in production environments.
In the mongo shell, connect to one mongodb instance by issuing the following command:
mongo localhost:10004/admin
In the mongo shell, initialize the second replica set by issuing the following command:
db.runCommand({"replSetInitiate" :
{"_id" : "secondset",
"members" : [{"_id" : 1, "host" : "localhost:10004"},
{"_id" : 2, "host" : "localhost:10005"},
{"_id" : 3, "host" : "localhost:10006"}
]}})
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
Add the second replica set to the cluster. Connect to the mongos instance created in the previous procedure and issue the following sequence of commands:
use admin
db.runCommand( { addShard : "secondset/localhost:10004,localhost:10005,localhost:10006" } )
This command returns the following success message:
{ "shardAdded" : "secondset", "ok" : 1 }
Verify that both shards are properly configured by running the listShards command. View this and example output below:
db.runCommand({listShards:1})
{
"shards" : [
{
"_id" : "firstset",
"host" : "firstset/localhost:10001,localhost:10003,localhost:10002"
},
{
"_id" : "secondset",
"host" : "secondset/localhost:10004,localhost:10006,localhost:10005"
}
],
"ok" : 1
}
MongoDB must have sharding enabled on both the database and collection levels.
Issue the enableSharding command. The following example enables sharding on the “test” database:
db.runCommand( { enableSharding : "test" } )
{ "ok" : 1 }
MongoDB uses the shard key to distribute documents between shards. Once selected, you cannot change the shard key. Good shard keys:
Typically shard keys are compound, comprising of some sort of hash and some sort of other primary key. Selecting a shard key depends on your data set, application architecture, and usage pattern, and is beyond the scope of this document. For the purposes of this example, we will shard the “number” key. This typically would not be a good shard key for production deployments.
Create the index with the following procedure:
use test
db.test_collection.ensureIndex({number:1})
See also
The Shard Key Overview and Shard Key sections.
Issue the following command:
use admin
db.runCommand( { shardCollection : "test.test_collection", key : {"number":1} })
{ "collectionsharded" : "test.test_collection", "ok" : 1 }
The collection test_collection is now sharded!
Over the next few minutes the Balancer begins to redistribute chunks of documents. You can confirm this activity by switching to the test database and running db.stats() or db.printShardingStatus().
As clients insert additional documents into this collection, mongos distributes the documents evenly between the shards.
In the mongo shell, issue the following commands to return statics against each cluster:
use test
db.stats()
db.printShardingStatus()
Example output of the db.stats() command:
{
"raw" : {
"firstset/localhost:10001,localhost:10003,localhost:10002" : {
"db" : "test",
"collections" : 3,
"objects" : 973887,
"avgObjSize" : 100.33173458522396,
"dataSize" : 97711772,
"storageSize" : 141258752,
"numExtents" : 15,
"indexes" : 2,
"indexSize" : 56978544,
"fileSize" : 1006632960,
"nsSizeMB" : 16,
"ok" : 1
},
"secondset/localhost:10004,localhost:10006,localhost:10005" : {
"db" : "test",
"collections" : 3,
"objects" : 26125,
"avgObjSize" : 100.33286124401914,
"dataSize" : 2621196,
"storageSize" : 11194368,
"numExtents" : 8,
"indexes" : 2,
"indexSize" : 2093056,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"ok" : 1
}
},
"objects" : 1000012,
"avgObjSize" : 100.33176401883178,
"dataSize" : 100332968,
"storageSize" : 152453120,
"numExtents" : 23,
"indexes" : 4,
"indexSize" : 59071600,
"fileSize" : 1207959552,
"ok" : 1
}
Example output of the db.printShardingStatus() command:
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "firstset", "host" : "firstset/localhost:10001,localhost:10003,localhost:10002" }
{ "_id" : "secondset", "host" : "secondset/localhost:10004,localhost:10006,localhost:10005" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "firstset" }
test.test_collection chunks:
secondset 5
firstset 186
[...]
In a few moments you can run these commands for a second time to demonstrate that chunks are migrating from firstset to secondset.
When this procedure is complete, you will have converted a replica set into a cluster where each shard is itself a replica set.
MongoDB provides the copydb and clone database commands to support migrations of entire logical databases between mongod instances. With these commands you can copy data between instances with a simple interface without the need for an intermediate stage. The db.cloneDatabase() and db.copyDatabase() provide helpers for these operations in the mongo shell.
Data migrations that require an intermediate stage or that involve more than one database instance are beyond the scope of this tutorial. copydb and clone are more ideal for use cases that resemble the following use cases:
Also consider the Backup and Restoration Strategies and Importing and Exporting MongoDB Data documentation for more related information.
To copy a database from one MongoDB instance to another and rename the database in the process, use the copydb command, or the db.copyDatabase() helper in the mongo shell.
Use the following procedure to copy the database named test on server db0.example.net to the server named db1.example.net and rename it to records in the process:
Verify that the database, test exists on the source mongod instance running on the db0.example.net host.
Connect to the destination server, running on the db1.example.net host, using the mongo shell.
Model your operation on the following command:
db.copyDatabase( "test", "records", db0.example.net )
You can also use copydb or the db.copyDatabase() helper to:
Use the following procedure to rename the test database records on a single mongod instance:
Connect to the mongod using the mongo shell.
Model your operation on the following command:
db.copyDatabase( "test", "records" )
To copy a database from a source MongoDB instance that has authentication enabled, you can specify authentication credentials to the copydb command or the db.copyDatabase() helper in the mongo shell.
In the following operation, you will copy the test database from the mongod running on db0.example.net to the records database on the local instance (e.g. db1.example.net.) Because the mongod instance running on db0.example.net requires authentication for all connections, you will need to pass db.copyDatabase() authentication credentials, as in the following procedure:
Connect to the destination mongod instance running on the db1.example.net host using the mongo shell.
Issue the following command:
db.copyDatabase( "test", "records", db0.example.net, "<username>", "<password>")
Replace <username> and <password> with your authentication credentials.
The clone command copies a database between mongod instances like copydb; however, clone preserves the database name from the source instance on the destination mongod.
For many operations, clone is functionally equivalent to copydb, but it has a more simple syntax and a more narrow use. The mongo shell provides the db.cloneDatabase() helper as a wrapper around clone.
You can use the following procedure to clone a database from the mongod instance running on db0.example.net to the mongod running on db1.example.net:
Connect to the destination mongod instance running on the db1.example.net host using the mongo shell.
Issue the following command to specify the name of the database you want to copy:
use records
Use the following operation to initiate the clone operation:
db.cloneDatabase( "db0.example.net" )
The database profiler collects fine grained data about MongoDB write operations, cursors, database commands on a running mongod instance. You can enable profiling on a per-database or per-instance basis. The profiling profiling level is also configurable when enabling profiling
The database profiler writes all the data it collects to the system.profile collection, which is a capped collection. See Database Profiler Output for overview of the data in the system.profile documents created by the profiler.
This document outlines a number of key administration options for the database profiler. For additional related information, consider the following resources:
The following profiling levels are available:
0 - the profiler is off, does not collect any data.
1 - collects profiling data for slow operations only. By default slow operations are those slower than 100 milliseconds.
You can modify the threshold for “slow” operations with the slowms runtime option or the set Parameter method. See the Specify the Threshold for Slow Operations section for more information.
2 - collects profiling data for all database operations.
You can enable database profiling from the mongo shell or through a driver using the profile command. This section will describe how to do so from the mongo shell. See your driver documentation if you want to control the profiler from within your application.
When you enable profiling, you also set the log level. The profiler records data in the system.profile collection. MongoDB creates the system.profile collection in a database after you enable profiling for that database.
To enable profiling and set the log level, issue use the db.setProfilingLevel() helper in the mongo shell, passing the log level as a parameter. For example, to enable profiling for all database operations, consider the following operation in the mongo shell:
db.setProfilingLevel(2)
The shell returns a document showing the previous level of profiling. The "ok" : 1 key-value pair indicates the operation succeeded:
{ "was" : 0, "slowms" : 100, "ok" : 1 }
To verify the new setting, see the Check Profiling Level section.
The threshold for slow operations applies to the entire mongod instance. When you change the threshold, you change it for all databases on the instance.
Important
Changing the slow operation threshold for the database profiler also affects the logging subsystem’s slow operation threshold for the entire mongod instance. Always set the threshold to the highest useful value.
By default the slow operation threshold is 100 milliseconds. Databases with a log level of 1 will log operations slower than 100 milliseconds.
To change the threshold, pass two parameters to the db.setProfilingLevel() helper in the mongo shell. The first parameter sets the log level for the current database, and the second sets the default slow operation threshold for the entire mongod instance.
For example, the following command sets the log level for the current database to 0, which disables profiling, and sets the slow-operation threshold for the mongod instance to 20 milliseconds. Any database on the instance with a log level of 1 will use this threshold:
db.setProfilingLevel(0,20)
To view the profiling level, issue the following from the mongo shell:
db.getProfilingStatus()
The shell returns a document similar to the following:
{ "was" : 0, "slowms" : 100 }
The was field indicates the current level of profiling.
The slowms field indicates how long an operation must exist in milliseconds for an operation to pass the “slow” threshold. MongoDB will log operations that take longer than the threshold if the profiling level is 1. This document returns the profiling level in the was field. For an explanation of log levels, see Profiling Levels.
To return only the log level, use the db.getProfilingLevel() helper in the mongo as in the following:
db.getProfilingLevel()
To disable profiling, use the following helper in the mongo shell:
db.setProfilingLevel(0)
For development purposes in testing environments, you can enable database profiling for an entire mongod instance. The profiling level applies to all databases provided by the mongod instance.
To enable profiling for a mongod instance, pass the following parameters to mongod at startup or within the configuration file:
mongod --profile=1 --slowms=15
This sets the profiling level to 1, which collects profiling data for slow operations only, and defines slow operations as those that last longer than 15 milliseconds.
The database profiler logs information about database operations in the system.profile collection.
To view log information, query the system.profile collection. To view example queries, see Profiler Overhead
For an explanation of the output data, see Database Profiler Output.
This section displays example queries to the system.profile collection. For an explanation of the query output, see Database Profiler Output.
To return the most recent 10 log entries in the system.profile collection, run a query similar to the following:
db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()
To return all operations except command operations ($cmd), run a query similar to the following:
db.system.profile.find( { op: { $ne : 'command' } } ).pretty()
To return operations for a particular collection, run a query similar to the following. This example returns operations in the mydb database’s test collection:
db.system.profile.find( { ns : 'mydb.test' } ).pretty()
To return operations slower than 5 milliseconds, run a query similar to the following:
db.system.profile.find( { millis : { $gt : 5 } } ).pretty()
To return information from a certain time range, run a query similar to the following:
db.system.profile.find(
{
ts : {
$gt : new ISODate("2012-12-09T03:00:00Z") ,
$lt : new ISODate("2012-12-09T03:40:00Z")
}
}
).pretty()
The following example looks at the time range, suppresses the user field from the output to make it easier to read, and sorts the results by how long each operation took to run:
db.system.profile.find(
{
ts : {
$gt : new ISODate("2011-07-12T03:00:00Z") ,
$lt : new ISODate("2011-07-12T03:40:00Z")
}
},
{ user : 0 }
).sort( { millis : -1 } )
On a database that has profiling enabled, the show profile helper in the mongo shell displays the 5 most recent operations that took at least 1 millisecond to execute. Issue show profile from the mongo shell, as follows:
show profile
When enabled, profiling has a minor effect on performance. The system.profile collection is a capped collection with a default size of 1 megabyte. A collection of this size can typically store store several thousand profile documents, but some application may use more or less profiling data per operation.
To change the size of the system.profile collection, you must:
For example, to create a new system.profile collections that’s 4000000 bytes, use the following sequence of operations in the mongo shell:
db.setProfilingLevel(0)
db.system.profile.drop()
db.createCollection( "system.profile", { capped: true, size:4000000 } )
db.setProfilingLevel(1)
Log rotation archives the current log file and starts a new one. Specifically, log rotation renames the current log file by appending the filename with a timestamp, [1] opens a new log file, and finally closes the old log. MongoDB will only rotate logs, when you use the logRotate command, or issue the process a SIGURS1 signal as described in this procedure.
See also
For information on logging, see the Process Logging section.
The following steps create and rotate a log file:
Start a mongod with verbose logging, with appending enabled, and with the following log file:
mongod -v --logpath /var/log/mongodb/server1.log --logappend
In a separate terminal, list the matching files:
ls /var/log/mongodb/server1.log*
For results, you get:
server1.log
Rotate the log file using one of the following methods.
From the mongo shell, issue the logRotate command from the admin database:
use admin
db.runCommand( { logRotate : 1 } )
This is the only available method to rotate log files on Windows systems.
From the UNIX shell, rotate logs for a single process by issuing the following command:
kill -SIGUSR1 <mongod process id>
From the UNIX shell, rotate logs for all mongod processes on a machine by issuing the following command:
killall -SIGUSR1 mongod
List the matching files again:
ls /var/log/mongodb/server1.log*
For results you get something similar to the following. The timestamps will be different.
server1.log server1.log.2011-11-24T23-30-00
The example results indicate a log rotation performed at exactly 11:30 pm on November 24th, 2011 UTC, which is the local time offset by the local time zone. The original log file is the one with the timestamp. The new log is server1.log file.
If you issue a second logRotate command an hour later, then an additional file would appear when listing matching files, as in the following example:
server1.log server1.log.2011-11-24T23-30-00 server1.log.2011-11-25T00-30-00
This operation does not modify the server1.log.2011-11-24T23-30-00 file created earlier, while server1.log.2011-11-25T00-30-00 is the previous server1.log file, renamed. server1.log is a new, empty file that receives all new log output.
| [1] | MongoDB renders this timestamp in UTC (GMT) and formatted as ISODate. |
Important
Use this procedure only if you must have indexes that are compatible with a version of MongoDB earlier than 2.0.
MongoDB version 2.0 introduced the {v:1} index format. MongoDB versions 2.0 and later support both the {v:1} format and the earlier {v:0} format.
MongoDB versions prior to 2.0, however, support only the {v:0} format. If you need to roll back MongoDB to a version prior to 2.0, you must drop and re-create your indexes.
To build pre-2.0 indexes, use the dropIndexes() and ensureIndex() methods. You cannot simply reindex the collection. When you reindex on versions that only support {v:0} indexes, the v fields in the index definition still hold values of 1, even though the indexes would now use the {v:0} format. If you were to upgrade again to version 2.0 or later, these indexes would not work.
Example
Suppose you rolled back from MongoDB 2.0 to MongoDB 1.8, and suppose you had the following index on the items collection:
{ "v" : 1, "key" : { "name" : 1 }, "ns" : "mydb.items", "name" : "name_1" }
The v field tells you the index is a {v:1} index, which is incompatible with version 1.8.
To drop the index, issue the following command:
db.items.dropIndex( { name : 1 } )
To recreate the index as a {v:0} index, issue the following command:
db.foo.ensureIndex( { name : 1 } , { v : 0 } )
See also
The documents outline basic security practices and risk management strategies. Additionally, this section includes MongoDB Tutorials that outline basic network filter and firewall rules to configure trusted environments for MongoDB.
As with all software running in a networked environment, administrators of MongoDB must consider security and risk exposures for a MongoDB deployment. There are no magic solutions for risk mitigation, and maintaining a secure MongoDB deployment is an ongoing process. This document takes a Defense in Depth approach to securing MongoDB deployments, and addresses a number of different methods for managing risk and reducing risk exposure
The intent of Defense In Depth approaches are to ensure there are no exploitable points of failure in your deployment that could allow an intruder or un-trusted party to access the data stored in the MongoDB database. The easiest and most effective way to reduce the risk of exploitation is to run MongoDB in a trusted environment, limit access, follow a system of least privilege, and follow best development and deployment practices. See the Strategies for Reducing Risk section for more information.
The most effective way to reduce risk for MongoDB deployments is to run your entire MongoDB deployment, including all MongoDB components (i.e. mongod, mongos and application instances) in a trusted environment. Trusted environments use the following strategies to control access:
You may further reduce risk by:
Continue reading this document for more information on specific strategies and configurations to help reduce the risk exposure of your application.
10gen takes the security of MongoDB and associated products very seriously. If you discover a vulnerability in MongoDB or another 10gen product, or would like to know more about our vulnerability reporting and response process, see the Vulnerability Notification document.
The following list includes all default ports used by MongoDB:
By default, listens for connections on the following ports:
By default MongoDB programs (i.e. mongos and mongod) will bind to all available network interfaces (i.e. IP addresses) on a system. The next section outlines various runtime options that allow you to limit access to MongoDB programs.
You can limit the network exposure with the following configuration options:
the nohttpinterface setting for mongod and mongos instances.
Disables the “home” status page, which would run on port 28017 by default. The status interface is read-only by default. You may also specify this option on the command line as mongod --nohttpinterface or mongos --nohttpinterface. Authentication does not control or affect access to this interface.
Important
Disable this option for production deployments. If do you leave this interface enabled, you should only allow trusted clients to access this port.
the port setting for mongod and mongos instances.
Changes the main port on which the mongod or mongos instance listens for connections. Changing the port does not menacingly reduce risk or limit exposure.
You may also specify this option on the command line as mongod --port or mongos --port.
Whatever port you attach mongod and mongos instances to, you should only allow trusted clients to connect to this port.
the rest setting for mongod and mongos instances.
Enables a fully interactive administrative REST interface, which is disabled by default. The status interface, which is enabled by default, is read-only. This configuration makes that interface fully interactive. The REST interface does not support any authentication and you should always restrict access to this interface to only allow trusted clients to connect to this port.
You may also enable this interface on the command line as mongod --rest.
Important
Disable this option for production deployments. If do you leave this interface enabled, you should only allow trusted clients to access this port.
the bind_ip setting for mongod and mongos instances.
Limits the network interfaces on which MongoDB programs will listen for incoming connections. You can also specify a number of interfaces by passing bind_ip a comma separated list of IP addresses. You can use the mongod --bind_ip and mongos --bind_ip option on the command line at run time to limit the network accessibility of a MongoDB program.
Important
Make sure that your mongod and mongos instances are only accessible on trusted networks. If your system has more than one network interface, bind MongoDB programs to the private or internal network interface.
Firewalls allow administrators to filter and control access to a system by providing granular control over what network communications. For administrators of MongoDB, the following capabilities are important:
On Linux systems, the iptables interface provides access to the underlying netfilter firewall. On Windows systems netsh command line interface provides access to the underlying Windows Firewall. For additional information about firewall configuration consider the following documents:
For best results and to minimize overall exposure, ensure that only traffic from trusted sources can reach mongod and mongos instances and that the mongod and mongos instances can only connect to trusted outputs.
See also
For MongoDB deployments on Amazon’s web services, see the Amazon EC2 wiki page, which addresses Amazon’s Security Groups and other EC2-specific security features.
Virtual private networks, or VPNs, make it possible to link two networks over an encrypted and limited-access trusted network. Typically MongoDB users who use VPNs use SSL rather than IPSEC VPNs for performance issues.
Depending on configuration and implementation VPNs provide for certificate validation and a choice of encryption protocols, which requires a rigorous level of authentication and identification of all clients. Furthermore, because VPNs provide a secure tunnel, using a VPN connection to control access to your MongoDB instance, you can prevent tampering and “man-in-the-middle” attacks.
Always run the mongod or mongos process as a unique user with the minimum required permissions and access. Never run a MongoDB program as a root or administrative users. The system users that run the MongoDB processes should have robust authentication credentials that prevent unauthorized or casual access.
To further limit the environment, you can run the mongod or mongos process in a chroot environment. Both user-based access restrictions and chroot configuration follow recommended conventions for administering all daemon processes on Unix-like systems.
You can disable anonymous access to the database by enabling authentication using the auth as detailed in the Authentication section.
MongoDB provides basic support for authentication with the auth setting. For multi-instance deployments (i.e. replica sets, and sharded clusters) use the keyFile setting, which implies auth, and allows intra-deployment authentication and operation. Be aware of the following behaviors of MongoDB’s authentication system:
Authentication is disabled by default.
MongoDB provisions access on a per-database level. Users either have read only access to a database or normal access to a database that permits full read and write access to the database. Normal access conveys the ability to add additional users to the database.
The system.users collection in each database stores all credentials. You can query the authorized users with the following operation:
db.system.users.find()
The admin database is unique. Users with normal access to the admin database have read and write access to all databases. Users with read only access to the admin database have read only access to all databases.
Additionally the admin database exposes several commands and functionality, such as listDatabases.
Once authenticated a normal user has full read and write access to a database.
If you have authenticated to a database as a normal, read and write, user; authenticating as a read-only user on the same database will invalidate the earlier authentication, leaving the current connection with read only access.
If you have authenticated to the admin database as normal, read and write, user; logging into a different database as a read only user will not invalidate the authentication to the admin database. In this situation, this client will be able to read and write data to this second database.
When setting up authentication for the first time you must either:
New in version 2.0: Support for authentication with sharded clusters. Before 2.0 sharded clusters had to run with trusted applications and a trusted networking configuration.
Consider the Control Access to MongoDB Instances with Authentication document which outlines procedures for configuring and maintaining users and access with MongoDB’s authentication system.
| [1] | Because of SERVER-6591, you cannot add the first user to a sharded cluster using the localhost connection in 2.2. If you are running a 2.2 sharded cluster, and want to enable authentication, you must deploy the cluster and add the first user to the admin database before restarting the cluster to run with keyFile. |
Simply limiting access to a mongod is not sufficient for totally controlling risk exposure. Consider the recommendations in the following section, for limiting exposure other interface-related risks.
Be aware of the following capabilities and behaviors of the mongo shell:
mongo will evaluate a .js file passed to the mongo --eval option. The mongo shell does not validate the input of JavaScript input to --eval.
mongo will evaluate a .mongorc.js file before starting. You can disable this behavior by passing the mongo --norc option.
On Linux and Unix systems, mongo reads the .mongorc.js file from $HOME/.mongorc.js (i.e. ~/.mongorc.js), and Windows mongo.exe reads the .mongorc.js file from %HOME%.mongorc.js or %HOMEDRIVE%%HOMEPATH%.mongorc.js.
The HTTP status interface provides a web-based interface that includes a variety of operational data, logs, and status reports regarding the mongod or mongos instance. The HTTP interface is always available on the port numbered 1000 greater than the primary mongod port. By default this is 28017, but is indirectly set using the port option which allows you to configure the primary mongod port.
Without the rest setting, this interface is entirely read-only, and limited in scope; nevertheless, this interface may represent an exposure. To disable the HTTP interface, set the nohttpinterface run time option or the --nohttpinterface command line option.
The REST API to MongoDB provides additional information and write access on top of the HTTP Status interface. The REST interface is disabled by default, and is not recommended for production use.
While the REST API does not provide any support for insert, update, or remove operations, it does provide administrative access, and its accessibility represents a vulnerability in a secure environment.
If you must use the REST API, please control and limit access to the REST API. The REST API does not include any support for authentication, even if when running with auth enabled.
See the following documents for instructions on restricting access to the REST API interface:
To support audit requirements, you may need to encrypt data stored in MongoDB. For best results you can encrypt this data in the application layer, by encrypting the content of fields that hold secure data.
Additionally, 10gen has a partnership with Gazzang to encrypt and secure sensitive data within MongoDB. The solution encrypts data in real time and Gazzang provides advanced key management that ensures only authorized processes and can access this data. The Gazzang software ensures that the cryptographic keys remain safe and ensures compliance with standards including HIPPA, PCI-DSS, and FERPA. For more information consider the following resources:
10gen values the privacy and security of all users of MongoDB, and we work very hard to ensure that MongoDB and related tools minimize risk exposure and increase the security and integrity of data and environments using MongoDB.
If you believe you have discovered a vulnerability in MongoDB or a related product or have experienced a security incident related to MongoDB, please report these issues so that 10gen can respond appropriately and work to prevent additional issues in the future. All vulnerability reports should contain as much information as possible so that we can move quickly to resolve the issue. In particular, please include the following:
10gen will respond to all vulnerability notifications within 48 hours.
10gen prefers jira.mongodb.org for all communication regarding MongoDB and related products.
Submit a ticket in the “Core Server Security” project, at: <https://jira.mongodb.org/SECURITY/>. The ticket number will become reference identification for the issue for the lifetime of the issue, and you can use this identifier for tracking purposes.
10gen will respond to any vulnerability notification received in a Jira case posted to the SECURITY project.
While Jira is the preferred communication vector, you may also report vulnerabilities via email to <security@10gen.com>.
You may encrypt email using our public key, to ensure the privacy of a any sensitive information in your vulnerability report.
10gen will respond to any vulnerability notification received via email with email which will contain a reference number (i.e. a ticket from the SECURITY project,) Jira case posted to the SECURITY project.
10gen will validate all submitted vulnerabilities. 10gen will use Jira to track all communications regarding the vulnerability, which may include requests for clarification and for additional information. If needed 10gen representatives can set up a conference call to exchange information regaining the vulnerability.
10gen requests that you do not publicly disclose any information regarding the vulnerability or exploit until 10gen has had the opportunity to analyze the vulnerability, respond to the notification, and to notify key users, customers, and partners if needed.
The amount of time required to validate a reported vulnerability depends on the complexity and severity of the issue. 10gen takes all required vulnerabilities very seriously, and will always ensure that there is a clear and open channel of communication with the reporter of the vulnerability.
After validating the issue, 10gen will coordinate public disclosure of the issue with the reporter in a mutually agreed timeframe and format. If required or requested, the reporter of a vulnerability will receive credit in the published security bulletin.
On contemporary Linux systems, the iptables program provides methods for managing the Linux Kernel’s netfilter or network packet filtering capabilities. These firewall rules make it possible for administrators to control what hosts can connect to the system, and limit risk exposure by limiting the hosts that can connect to a system.
This document outlines basic firewall configurations for iptables firewalls on Linux. Use these approaches as a starting point for your larger networking organization. For a detailed over view of security practices and risk management for MongoDB, see Security Practices and Management.
See also
For MongoDB deployments on Amazon’s web services, see the Amazon EC2 wiki page, which addresses Amazon’s Security Groups and other EC2-specific security features.
Rules in iptables configurations fall into chains, which describe the process for filtering and processing specific streams of traffic. Chains have an order, and packets must pass through earlier rules in a chain to reach later rules. This document only the following two chains:
Given the default ports of all MongoDB processes, you must configure networking rules that permit only required communication between your application and the appropriate mongod and mongos instances.
Be aware that, by default, the default policy of iptables is to allow all connections and traffic unless explicitly disabled. The configuration changes outlined in this document will create rules that explicitly allow traffic from specific addresses and on specific ports, using a default policy that drops all traffic that is not explicitly allowed. When you have properly configured your iptables rules to allow only the traffic that you want to permit, you can Change Default Policy to DROP.
This section contains a number of patterns and examples for configuring iptables for use with MongoDB deployments. If you have configured different ports using the port configuration setting, you will need to modify the rules accordingly.
This pattern is applicable to all mongod instances running as standalone instances or as part of a replica set.
The goal of this pattern is to explicitly allow traffic to the mongod instance from the application server. In the following examples, replace <ip-address> with the IP address of the application server:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27017 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27017 -m state --state ESTABLISHED -j ACCEPT
The first rule allows all incoming traffic from <ip-address> on port 27017, which allows the application server to connect to the mongod instance. The second rule, allows outgoing traffic from the mongod to reach the application server.
Optional
If you have only one application server, you can replace <ip-address> with either the IP address itself, such as: 198.51.100.55. You can also express this using CIDR notation as 198.51.100.55/32. If you want to permit a larger block of possible IP addresses you can allow traffic from a /24 using one of the following specifications for the <ip-address>, as follows:
10.10.10.10/24
10.10.10.10/255.255.255.0
mongos instances provide query routing for sharded clusters. Clients connect to mongos instances, which behave from the client’s perspective as mongod instances. In turn, the mongos connects to all mongod instances that are components of the sharded cluster.
Use the same iptables command to allow traffic to and from these instances as you would from the mongod instances that are members of the replica set. Take the configuration outlined in the Traffic to and from mongod Instances section as an example.
Config servers, host the config database that stores metadata for sharded clusters. Each production cluster has three config servers, initiated using the mongod --configsvr option. [1] Config servers listen for connections on port 27019. As a result, add the following iptables rules to the config server to allow incoming and outgoing connection on port 27019, for connection to the other config servers.
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27019 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27019 -m state --state ESTABLISHED -j ACCEPT
Replace <ip-address> with the address or address space of all the mongod that provide config servers.
Additionally, config servers need to allow incoming connections from all of the mongos instances in the cluster and all mongod instances in the cluster. Add rules that resemble the following:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27019 -m state --state NEW,ESTABLISHED -j ACCEPT
Replace <ip-address> with the address of the mongos instances and the shard mongod instances.
| [1] | You can also run a config server by setting the configsvr option in a configuration file. |
For shard servers, running as mongod --shardsvr [2] Because the default port number when running with shardsvr is 27018, you must configure the following iptables rules to allow traffic to and from each shard:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 27018 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT
Replace the <ip-address> specification with the IP address of all mongod. This allows you to permit incoming and outgoing traffic between all shards including constituent replica set members, to:
Furthermore, shards need to be able make outgoing connections to:
Create a rule that resembles the following, and replace the <ip-address> with the address of the config servers and the mongos instances:
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT
| [2] | You can also specify the shard server option using the shardsvr setting in the configuration file. Shard members are also often conventional replica sets using the default port. |
| [3] | All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations. |
The mongostat diagnostic tool, when running with the --discover needs to be able to reach all components of a cluster, including the config servers, the shard servers, and the mongos instances.
If your monitoring system needs access the HTTP interface, insert the following rule to the chain:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 28017 -m state --state NEW,ESTABLISHED -j ACCEPT
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface. For all deployments, you should restrict access to this port to only the monitoring instance.
Optional
For shard server mongod instances running with shardsvr, the rule would resemble the following:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 28018 -m state --state NEW,ESTABLISHED -j ACCEPT
For config server mongod instances running with configsvr, the rule would resemble the following:
iptables -A INPUT -s <ip-address> -p tcp --destination-port 28019 -m state --state NEW,ESTABLISHED -j ACCEPT
The default policy for iptables chains is to allow all traffic. After completing all iptables configuration changes, you must change the default policy to DROP so that all traffic that isn’t explicitly allowed as above will not be able to reach components of the MongoDB deployment. Issue the following commands to change this policy:
iptables -P INPUT DROP
iptables -P OUTPUT DROP
This section contains a number of basic operations for managing and using iptables. There are various front end tools that automate some aspects of iptables configuration, but at the core all iptables front ends provide the same basic functionality:
By default all iptables rules are only stored in memory. When your system restarts, your firewall rules will revert to their defaults. When you have tested a rule set and have guaranteed that it effectively controls traffic you can use the following operations to you should make the rule set persistent.
On Red Hat Enterprise Linux, Fedora Linux, and related distributions you can issue the following command:
service iptables save
On Debian, Ubuntu, and related distributions, you can use the following command to dump the iptables rules to the /etc/iptables.conf file:
iptables-save > /etc/iptables.conf
Run the following operation to restore the network rules:
iptables-restore < /etc/iptables.conf
Place this command in your rc.local file, or in the /etc/network/if-up.d/iptables file with other similar operations.q
To list all of currently applied iptables rules, use the following operation at the system shell.
iptables --L
If you make a configuration mistake when entering iptables rules or simply need to revert to the default rule set, you can use the following operation at the system shell to flush all rules:
iptables --F
If you’ve already made your iptables rules persistent, you will need to repeat the appropriate procedure in the Make all iptables Rules Persistent section.
On Windows Server systems, the netsh program provides methods for managing the Windows Firewall. These firewall rules make it possible for administrators to control what hosts can connect to the system, and limit risk exposure by limiting the hosts that can connect to a system.
This document outlines basic Windows Firewall configurations. Use these approaches as a starting point for your larger networking organization. For a detailed over view of security practices and risk management for MongoDB, see Security Practices and Management.
See also
Windows Firewall documentation from Microsoft.
Windows Firewall processes rules in an ordered determined by rule type, and parsed in the following order:
By default, the policy in Windows Firewall allows all outbound connections and blocks all incoming connections.
Given the default ports of all MongoDB processes, you must configure networking rules that permit only required communication between your application and the appropriate mongod.exe and mongos.exe instances.
The configuration changes outlined in this document will create rules which explicitly allow traffic from specific addresses and on specific ports, using a default policy that drops all traffic that is not explicitly allowed.
You can configure the Windows Firewall with using the netsh command line tool or through a windows application. On Windows Server 2008 this application is Windows Firewall With Advanced Security in Administrative Tools. On previous versions of Windows Server, access the Windows Firewall application in the System and Security control panel.
The procedures in this document use the netsh command line tool.
This section contains a number of patterns and examples for configuring Windows Firewall` for use with MongoDB deployments. If you have configured different ports using the port configuration setting, you will need to modify the rules accordingly.
This pattern is applicable to all mongod.exe instances running as standalone instances or as part of a replica set. The goal of this pattern is to explicitly allow traffic to the mongod.exe instance from the application server.
netsh advfirewall firewall add rule name="Open mongod port 27017" dir=in action=allow protocol=TCP localport=27017
This rule allows all incoming traffic to port 27017, which allows the application server to connect to the mongod.exe instance.
Windows Firewall also allows enabling network access for an entire application rather than to a specific port, as in the following example:
netsh advfirewall firewall add rule name="Allowing mongod" dir=in action=allow program=" C:\mongodb\bin\mongod.exe"
You can allow all access for a mongos.exe server, with the following invocation:
netsh advfirewall firewall add rule name="Allowing mongos" dir=in action=allow program=" C:\mongodb\bin\mongos.exe"
mongos.exe instances provide query routing for sharded clusters. Clients connect to mongos.exe instances, which behave from the client’s perspective as mongod.exe instances. In turn, the mongos.exe connects to all mongod.exe instances that are components of the sharded cluster.
Use the same Windows Firewall command to allow traffic to and from these instances as you would from the mongod.exe instances that are members of the replica set.
netsh advfirewall firewall add rule name="Open mongod shard port 27018" dir=in action=allow protocol=TCP localport=27018
Configuration servers, host the config database that stores metadata for sharded clusters. Each production cluster has three configuration servers, initiated using the mongod --configsvr option. [1] Configuration servers listen for connections on port 27019. As a result, add the following Windows Firewall rules to the config server to allow incoming and outgoing connection on port 27019, for connection to the other config servers.
netsh advfirewall firewall add rule name="Open mongod config svr port 27019" dir=in action=allow protocol=TCP localport=27019
Additionally, config servers need to allow incoming connections from all of the mongos.exe instances in the cluster and all mongod.exe instances in the cluster. Add rules that resemble the following:
netsh advfirewall firewall add rule name="Open mongod config svr inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=27019
Replace <ip-address> with the addresses of the mongos.exe instances and the shard mongod.exe instances.
| [1] | You can also run a config server by setting the configsvr option in a configuration file. |
For shard servers, running as mongod --shardsvr [2] Because the default port number when running with shardsvr is 27018, you must configure the following Windows Firewall rules to allow traffic to and from each shard:
netsh advfirewall firewall add rule name="Open mongod shardsvr inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=27018
netsh advfirewall firewall add rule name="Open mongod shardsvr outbound" dir=out action=allow protocol=TCP remoteip=<ip-address> localport=27018
Replace the <ip-address> specification with the IP address of all mongod.exe instances. This allows you to permit incoming and outgoing traffic between all shards including constituent replica set members to:
Furthermore, shards need to be able make outgoing connections to:
Create a rule that resembles the following, and replace the <ip-address> with the address of the config servers and the mongos.exe instances:
netsh advfirewall firewall add rule name="Open mongod config svr outbound" dir=out action=allow protocol=TCP remoteip=<ip-address> localport=27018
| [2] | You can also specify the shard server option using the shardsvr setting in the configuration file. Shard members are also often conventional replica sets using the default port. |
| [3] | All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations. |
The mongostat diagnostic tool, when running with the --discover needs to be able to reach all components of a cluster, including the config servers, the shard servers, and the mongos.exe instances.
If your monitoring system needs access the HTTP interface, insert the following rule to the chain:
netsh advfirewall firewall add rule name="Open mongod HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28017
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface. For all deployments, you should restrict access to this port to only the monitoring instance.
Optional
For shard server mongod.exe instances running with shardsvr, the rule would resemble the following:
netsh advfirewall firewall add rule name="Open mongos HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28018
For config server mongod.exe instances running with configsvr, the rule would resemble the following:
netsh advfirewall firewall add rule name="Open mongod configsvr HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28019
This section contains a number of basic operations for managing and using netsh. While you can use the GUI front ends to manage the Windows Firewall, all core functionality is accessible is accessible from netsh.
To delete the firewall rule allowing mongod.exe traffic:
netsh advfirewall firewall delete rule name="Open mongod port 27017" protocol=tcp localport=27017
netsh advfirewall firewall delete rule name="Open mongod shard port 27018" protocol=tcp localport=27018
To return a list of all Windows Firewall rules:
netsh advfirewall firewall show rule name=all
To simplify administration of larger collection of systems, you can export or import firewall systems from different servers) rules very easily on Windows:
Export all firewall rules with the following command:
netsh advfirewall export "C:\temp\MongoDBfw.wfw"
Replace "C:\temp\MongoDBfw.wfw" with a path of your choosing. You can use a command in the following form to import a file created using this operation:
netsh advfirewall import "C:\temp\MongoDBfw.wfw"
MongoDB provides a basic authentication system, that you can enable with the auth and keyFile configuration settings. [1] See the authentication section of the Security Practices and Management document.
This document contains an overview of all operations related to authentication and managing a MongoDB deployment with authentication.
See
The Security Considerations section of the Run-time Database Configuration document for more information on configuring authentication.
| [1] | Use the --auth --keyFile options on the command line. |
When setting up authentication for the first time you must either:
Begin by setting up the first administrative user for the mongod instance.
| [2] | Because of SERVER-6591, you cannot add the first user to a sharded cluster using the localhost connection in 2.2. If you are running a 2.2 sharded cluster, and want to enable authentication, you must deploy the cluster and add the first user to the admin database before restarting the cluster to run with keyFile. |
About administrative users
Administrative users are those users that have “normal” or read and write access to the admin database.
If this is the first administrative user, [3] connect to the mongod on the localhost interface using the mongo shell. Then, issue the following command sequence to switch to the admin database context and add the administrative user:
use admin
db.addUser("<username>", "<password>")
Replace <username> and <password> with the credentials for this administrative user.
| [3] | You can also use this procedure if authentication is not enabled so that your databases has an administrative user when you enable auth. |
To add a user with read and write access to a specific database, in this example the records database, connect to the mongod instance using the mongo shell, and issue the following sequence of operations:
use records
db.addUser("<username>", "<password>")
Replace <username> and <password> with the credentials for this user.
To add a user with read only access to a specific database, in this example the records database, connect to the mongod instance using the mongo shell, and issue the following sequence of operations:
use records
db.addUser("<username>", "<password>", true)
Replace <username> and <password> with the credentials for this user.
Although administrative accounts have access to all databases, these users must authenticate against the admin database before changing contexts to a second database, as in the following example:
Example
Given the superAdmin user with the password Password123, and access to the admin database.
The following operation in the mongo shell will succeed:
use admin
db.auth("superAdmin", "Password123")
However, the following operation will fail:
use test
db.auth("superAdmin", "Password123")
Note
If you have authenticated to the admin database as normal, read and write, user; logging into a different database as a read only user will not invalidate the authentication to the admin database. In this situation, this client will be able to read and write data to this second database.
The behavior of mongod running with auth, when connecting from a client over the localhost interface (i.e. a client running on the same system as the mongod,) varies slightly between before and after version 2.2.
In general if there are no users for the admin database, you may connect via the localhost interface. For sharded clusters running version 2.2, if mongod is running with auth then all users connecting over the localhost interface must authenticate, even if there aren’t any users in the admin database.
In version 2.2 and earlier:
As a result, always use unique username and password combinations on for each database.
| [4] | Read only users do not have access to the system.users database. |
Thanks to Will Urbanski, from Dell SecureWorks, for identifying this issue.
The following sections, outline practices for enabling and managing authentication with specific MongoDB deployments:
Use the following command at the system shell to generate pseudo-random content for a key file:
openssl rand -base64 753
Note
Be aware that MongoDB strips whitespace characters (e.g. x0d, x09, and x20,) for cross-platform convenience. As a result, the following keys are identical:
echo -e "my secret key" > key1
echo -e "my secret key\n" > key2
echo -e "my secret key" > key3
echo -e "my\r\nsecret\r\nkey\r\n" > key4
CRUD stands for create, read, update, and delete, which are the four core database operations used in database driven application development. The CRUD Operations section provides introduction to each class of operation along with complete examples of each operation. The documents in the Read and Write Operations section provide a higher level overview of the behavior and available functionality of these operations.
The Read Operations and Write Operations documents provide higher level introductions and description of the behavior and operations of read and write operations for MongoDB deployments. The BSON Documents provides an overview of documents and document-orientation in MongoDB.
Read operations include all operations that return a cursor in response to application request datas (i.e. queries,) and also include a number of aggregation operations that do not return a cursor but have similar properties as queries. These commands include aggregate, count, and distinct.
This document describes the syntax and structure of the queries applications use to request data from MongoDB and how different factors affect the efficiency of reads.
Note
All of the examples in this document use the mongo shell interface. All of these operations are available in an idiomatic interface for each language by way of the MongoDB Driver. See your driver documentation for full API documentation.
In the mongo shell, the find() and findOne() methods perform read operations. The find() method has the following syntax: [1]
db.collection.find( <query>, <projection> )
The db.collection object specifies the database and collection to query. All queries in MongoDB address a single collection.
You can enter db in the mongo shell to return the name of the current database. Use the show collections operation in the mongo shell to list the current collections in the database.
Queries in MongoDB are BSON objects that use using a set of query operators to describe query parameters.
The <query> argument of the find() method holds this query document. A read operation without a query document will return all documents in the collection.
The <projection> argument describes the result set in the form of a document. Projections specify or limit the fields to return.
Without a projection, the operation will return all fields of the documents. Specify a projection if your documents are larger, or when your application only needs a subset of available fields.
The order of documents returned by a query is not defined and is not necessarily consistent unless you specify a sort (sort()).
For example, the following operation on the inventory collection selects all documents where the type field equals 'food' and the price field has a value less than 9.95. The projection limits the response to the item and qty, and _id field:
db.inventory.find( { type: 'food', price: { $lt: 9.95 } },
{ item: 1, qty: 1 } )
The findOne() method is similar to the find() method except the findOne() method returns a single document from a collection rather than a cursor. The method has the syntax:
db.collection.findOne( <query>, <projection> )
For additional documentation and examples of the main MongoDB read operators, refer to the Read page of the CRUD section.
| [1] | db.collection.find() is a wrapper for the more formal query structure with the $query operator. |
This section provides an overview of the query document for MongoDB queries. See the preceding section for more information on queries in MongoDB.
The following examples demonstrate the key properties of the query document in MongoDB queries, using the find() method from the mongo shell, and a collection of documents named inventory:
An empty query document ({}) selects all documents in the collection:
db.inventory.find( {} )
Not specifying a query document to the find() is equivalent to specifying an empty query document. Therefore the following operation is equivalent to the previous operation:
db.inventory.find()
A single-clause query selects all documents in a collection where a field has a certain value. These are simple “equality” queries.
In the following example, the query selects all documents in the collection where the type field has the value snacks:
db.inventory.find( { type: "snacks" } )
A single-clause query document can also select all documents in a collection given a condition or set of conditions for one field in the collection’s documents. Use the query operators to specify conditions in a MongoDB query.
In the following example, the query selects all documents in the collection where the value of the type field is either 'food' or 'snacks':
db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } )
A compound query can specify conditions for more than one field in the collection’s documents. Implicitly, a logical AND conjunction connects the clauses of a compound query so that the query selects the documents in the collection that match all the conditions.
In the following example, the query document specifies an equality match on a single field, followed by a range of values for a second field using a comparison operator:
db.inventory.find( { type: 'food', price: { $lt: 9.95 } } )
This query selects all documents where the type field has the value 'food' and the value of the price field is less than ($lt) 9.95.
Using the $or operator, you can specify a compound query that joins each clause with a logical OR conjunction so that the query selects the documents in the collection that match at least one condition.
In the following example, the query document selects all documents in the collection where the field qty has a value greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95:
db.inventory.find( { $or: [ { qty: { $gt: 100 } },
{ price: { $lt: 9.95 } } ]
} )
With additional clauses, you can specify precise conditions for matching documents. In the following example, the compound query document selects all documents in the collection where the value of the type field is 'food' and either the qty has a value greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95:
db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } },
{ price: { $lt: 9.95 } } ]
} )
When the field holds an embedded document (i.e. subdocument), you can either specify the entire subdocument as the value of a field, or “reach into” the subdocument using dot notation, to specify values for individual fields in the subdocument:
Equality matches within subdocuments select documents if the subdocument matches exactly the specified subdocument, including the field order.
In the following example, the query matches all documents where the value of the field producer is a subdocument that contains only the field company with the value 'ABC123' and the field address with the value '123 Street', in the exact order:
db.inventory.find( {
producer: {
company: 'ABC123',
address: '123 Street'
}
}
)
Equality matches for specific fields within subdocuments select documents when the field in the subdocument contains a field that matches the specified value.
In the following example, the query uses the dot notation to match all documents where the value of the field producer is a subdocument that contains a field company with the value 'ABC123' and may contain other fields:
db.inventory.find( { 'producer.company': 'ABC123' } )
When the field holds an array, you can query for values in the array, and if the array holds subdocuments, you query for specific fields within the these subdocuments using dot notation:
Equality matches can specify an entire array, to select an array that matches exactly. In the following example, the query matches all documents where the value of the field tags is an array and holds three elements, 'fruit', 'food', and 'citrus', in this order:
db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } )
Equality matches can specify a single element in the array. If the array contains at least one element with the specified value, as in the following example: the query matches all documents where the value of the field tags is an array that contains, as one of its elements, the element 'fruit':
db.inventory.find( { tags: 'fruit' } )
Equality matches can also select documents by values in an array using the array index (i.e. position) of the element in the array, as in the following example: the query uses the dot notation to match all documents where the value of the tags field is an array whose first element equals 'fruit':
db.inventory.find( { 'tags.0' : 'fruit' } )
In the following examples, consider an array that contains subdocuments:
If you know the array index of the subdocument, you can specify the document using the subdocument’s position.
The following example selects all documents where the memos contains an array whose first element (i.e. index is 0) is a subdocument with the field by with the value 'shipping':
db.inventory.find( { 'memos.0.by': 'shipping' } )
If you do not know the index position of the subdocument, concatenate the name of the field that contains the array, with a dot (.) and the name of the field in the subdocument.
The following example selects all documents where the memos field contains an array that contains at least one subdocument with the field by with the value 'shipping':
db.inventory.find( { 'memos.by': 'shipping' } )
To match by multiple fields in the subdocument, you can use either dot notation or the $elemMatch operator:
The following example uses dot notation to query for documents where the value of the memos field is an array that has at least one subdocument that contains the field memo equal to 'on time' and the field by equal to 'shipping':
db.inventory.find(
{
'memos.memo': 'on time',
'memos.by': 'shipping'
}
)
The following example uses $elemMatch to query for documents where the value of the memos field is an array that has at least one subdocument that contains the field memo equal to 'on time' and the field by equal to 'shipping':
db.inventory.find( { memos: {
$elemMatch: {
memo : 'on time',
by: 'shipping'
}
}
}
)
Refer to the Query, Update, Projection, and Aggregation Operators document for the complete list of query operators.
The projection specification limits the fields to return for all matching documents. Constraining the result set by restricting the fields to return can minimize network transit costs and the costs of deserializing documents in the application layer.
The second argument to the find() method is a projection, and it takes the form of a document with a list of fields for inclusion or exclusion from the result set. You can either specify the fields to include (e.g. { field: 1 }) or specify the fields to exclude (e.g. { field: 0 }). The _id field is implicitly included, unless explicitly excluded.
Note
You cannot combine inclusion and exclusion semantics in a single projection with the exception of the _id field.
Consider the following projection specifications in find() operations:
If you specify no projection, the find() method returns all fields of all documents that match the query.
db.inventory.find( { type: 'food' } )
This operation will return all documents in the inventory collection where the value of the type field is 'food'.
A projection can explicitly include several fields. In the following operation, find() method returns all documents that match the query as well as item and qty fields. The results also include the _id field:
db.inventory.find( { type: 'food' }, { item: 1, qty: 1 } )
You can remove the _id field by excluding it from the projection, as in the following example:
db.inventory.find( { type: 'food' }, { item: 1, qty: 1, _id:0 } )
This operation returns all documents that match the query, and only includes the item and qty fields in the result set.
To exclude a single field or group of fields you can use a projection in the following form:
db.inventory.find( { type: 'food' }, { type:0 } )
This operation returns all documents where the value of the type field is food, but does not include the type field in the output.
With the exception of the _id field you cannot combine inclusion and exclusion statements in projection documents.
The $elemMatch and $slice projection operators provide more control when projecting only a portion of an array.
Indexes improve the efficiency of read operations by reducing the amount of data that query operations need to process and thereby simplifying the work associated with fulfilling queries within MongoDB. The indexes themselves are a special data structure that MongoDB maintains when inserting or modifying documents, and any given index can: support and optimize specific queries, sort operations, and allow for more efficient storage utilization. For more information about indexes in MongoDB see: Indexes and Indexing Overview.
You can create indexes using the db.collection.ensureIndex() method in the mongo shell, as in the following prototype operation:
db.collection.ensureIndex( { <field1>: <order>, <field2>: <order>, ... } )
The field specifies the field to index. The field may be a field from a subdocument, using dot notation to specify subdocument fields.
You can create an index on a single field or a compound index that includes multiple fields in the index.
The order option is specifies either ascending ( 1 ) or descending ( -1 ).
MongoDB can read the index in either direction. In most cases, you only need to specify indexing order to support sort operations in compound queries.
The explain() cursor method allows you to inspect the operation of the query system, and is useful for analyzing the efficiency of queries, and for determining how the query uses the index. Call the explain() method on a cursor returned by find(), as in the following example:
db.inventory.find( { type: 'food' } ).explain()
Note
Only use explain() to test the query operation, and not the timing of query performance. Because explain() attempts multiple query plans, it does not reflect accurate query performance.
If the above operation could not use an index, the output of explain() would resemble the following:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 4000006,
"nscanned" : 4000006,
"nscannedObjectsAllPlans" : 4000006,
"nscannedAllPlans" : 4000006,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2,
"nChunkSkips" : 0,
"millis" : 1591,
"indexBounds" : { },
"server" : "mongodb0.example.net:27017"
}
The BasicCursor value in the cursor field confirms that this query does not use an index. The nscannedObjects value shows that MongoDB must scan 4,000,006 documents to return only 5 documents. To increase the efficiency of the query, create an index on the type field, as in the following example:
db.inventory.ensureIndex( { type: 1 } )
Run the explain() operation, as follows, to test the use of the index:
db.inventory.find( { type: 'food' } ).explain()
Consider the results:
{
"cursor" : "BtreeCursor type_1",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 5,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 5,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : { "type" : [
[ "food",
"food" ]
] },
"server" : "mongodbo0.example.net:27017" }
The BtreeCursor value of the cursor field indicates that the query used an index. This query:
returned 5 documents, as indicated by the n field;
scanned 5 documents from the index, as indicated by the nscanned field;
then read 5 full documents from the collection, as indicated by the nscannedObjects field.
This indicates that the query was not “covered,” or able to complete only using the index, as reflected in the indexOnly. See Create Indexes that Support Covered Queries for more information.
The MongoDB query optimizer processes queries and chooses the most efficient query plan for a query given the available indexes. The query system then uses this query plan each time the query runs. The query optimizer occasionally reevaluates query plans as the content of the collection changes to ensure optimal query plans.
To create a new query plan, the query optimizer:
runs the query against several indexes in parallel.
records the matches in a single common buffer, as though the results all came from the same index.
If an index returns a result already returned by another index, the optimizer skips the duplicate match.
selects an index when either of the following occur:
The selected index becomes the index specified in the query plan; future iterations of this query or queries with the same query pattern will use this index. Query pattern refers to query select conditions that differ only in the values, as in the following two queries with the same query pattern:
db.inventory.find( { type: 'food' } )
db.inventory.find( { type: 'utensil' } )
To manually compare the performance of a query using more than one index, you can use the hint() and explain() methods in conjunction, as in the following prototype:
db.collection.find().hint().explain()
The following operations each run the same query but will reflect the use of the different indexes:
db.inventory.find( { type: 'food' } ).hint( { type: 1 } ).explain()
db.inventory.find( { type: 'food' } ).hint( { type: 1, name: 1 }).explain()
This returns the statistics regarding the execution of the query. For more information on the output of explain() see the Explain Output.
Note
If you run explain() without including hint(), the query optimizer reevaluates the query and runs against multiple indexes before returning the query statistics.
As collections change over time, the query optimizer deletes a query plan and reevaluates the after any of the following events:
For more information, see Indexing Strategies.
Some query operations cannot use indexes effectively or cannot use indexes at all. Consider the following situations:
The inequality operators $nin and $ne are not very selective, as they often match a large portion of the index.
As a result, in most cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.
Queries that specify regular expressions, with inline JavaScript regular expressions or $regex operator expressions, cannot use an index. However, the regular expression with anchors to the beginning of a string can use an index.
The find() method returns a cursor to the results; however, in the mongo shell, if the returned cursor is not assigned to a variable, then the cursor is automatically iterated up to 20 times [2] to print up to the first 20 documents that match the query, as in the following example:
db.inventory.find( { type: 'food' } );
When you assign the find() to a variable:
you can call the cursor variable in the shell to iterate up to 20 times [2] and print the matching documents, as in the following example:
var myCursor = db.inventory.find( { type: 'food' } );
myCursor
you can use the cursor method next() to access the documents, as in the following example:
var myCursor = db.inventory.find( { type: 'food' } );
var myDocument = myCursor.hasNext() ? myCursor.next() : null;
if (myDocument) {
var myItem = myDocument.item;
print(tojson(myItem));
}
As an alternative print operation, cosider the the printjson() helper method to replace print(tojson()):
if (myDocument) {
var myItem = myDocument.item;
printjson(myItem);
}
you can use the cursor method forEach() to iterate the cursor and access the documents, as in the following example:
var myCursor = db.inventory.find( { type: 'food' } );
myCursor.forEach(printjson);
See JavaScript cursor methods and your driver documentation for more information on cursor methods.
| [2] | (1, 2) You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. |
In the mongo shell, you can use the toArray() method to iterate the cursor and return the documents in an array, as in the following:
var myCursor = db.inventory.find( { type: 'food' } );
var documentArray = myCursor.toArray();
var myDocument = documentArray[3];
The toArray() method loads into RAM all documents returned by the cursor; the toArray() method exhausts the cursor.
Additionally, some drivers provide access to the documents by using an index on the cursor (i.e. cursor[index]). This is a shortcut for first calling the toArray() method and then using an index on the resulting array.
Consider the following example:
var myCursor = db.inventory.find( { type: 'food' } );
var myDocument = myCursor[3];
The myCursor[3] is equivalent to the following example:
myCursor.toArray() [3];
Consider the following behaviors related to cursors:
By default, the server will automatically close the cursor after 10 minutes of inactivity or if client has exhausted the cursor. To override this behavior, you can specify the noTimeout wire protocol flag in your query; however, you should either close the cursor manually or exhaust the cursor. In the mongo shell, you can set the noTimeout flag:
var myCursor = db.inventory.find().addOption(DBQuery.Option.noTimeout);
See your driver documentation for information on setting the noTimeout flag. See Cursor Flags for a complete list of available cursor flags.
Because the cursor is not isolated during its lifetime, intervening write operations may result in a cursor that returns a single document [3] more than once. To handle this situation, see the information on snapshot mode.
The MongoDB server returns the query results in batches:
For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes. To override the default size of the batch, see cursor.batchSize() and cursor.limit().
For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort and will return all documents in the first batch.
Batch size will not exceed the maximum BSON document size.
As you iterate through the cursor and reach the end of the returned batch, if there are more results, cursor.next() will perform a getmore operation to retrieve the next batch.
To see how many documents remain in the batch as you iterate the cursor, you can use the cursor.objsLeftInBatch() method, as in the following example:
var myCursor = db.inventory.find();
var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null;
myCursor.objsLeftInBatch();
You can use the command cursorInfo to retrieve the following information on cursors:
Consider the following example:
db.runCommand( { cursorInfo: 1 } )
The result from the command returns the following documentation:
{ "totalOpen" : <number>, "clientCursors_size" : <number>, "timedOut" : <number>, "ok" : 1 }
| [3] | A single document relative to value of the _id field. A cursor cannot return the same document more than once if the document has not changed. |
The mongo shell provides the following cursor flags:
Changed in version 2.2.
MongoDB can perform some basic data aggregation operations on results before returning data to the application. These operations are not queries, they use database commands rather than queries, and they do not return a cursor; however, they are still require MongoDB to read data.
Running aggregation operations on the database side can be more efficient than running them in the application layer and can reduce the amount of data MongoDB needs to send to the application. These aggregation operations include basic grouping, counting, and even processing data using a map reduce framework. Additionally, in 2.2 MongoDB provides a complete aggregation framework for more rich aggregation operations.
The aggregation framework provides users with a “pipeline” like framework: documents enter from a collection and then pass through a series of steps by a sequence of pipeline operators that manipulate and transform the documents until they’re output at the end. The aggregation framework is accessible via the aggregate command or the db.collection.aggregate() helper in the mongo shell.
For more information on the aggregation framework see Aggregation.
Additionally, MongoDB provides a number of simple data aggregation operations for more basic data aggregation operations:
Sharded clusters allow you to partition a data set among a cluster of program:mongod in a way that is nearly transparent to the application. See the Sharding section of this manual for additional information about these deployments.
For a sharded cluster, you issue all operations to one of the mongos instances associated with the cluster. mongos instances route operations to the mongod in the cluster and behave like mongod instances to the application. Read operations to a sharded collection in a sharded cluster are largely the same as operations to a replica set or standalone instances. See the section on Read Operations in Sharded Clusters for more information.
In sharded deployments, the mongos instance routes the queries from the clients to the mongod instances that hold the data, using the cluster metadata stored in the config database.
For sharded collections, if queries do not include the shard key, the mongos must direct the query to all shards in a collection. These scatter gather queries can be inefficient, particularly on larger clusters, and are unfeasible for routine operations.
For more information on read operations in sharded clusters, consider the following resources:
Replica sets use read preferences to determine where and how to route read operations to members of the replica set. By default, MongoDB always reads data from a replica set’s primary. You can modify that behavior by changing the read preference mode.
You can configure the read preference mode on a per-connection or per-operation basis to allow reads from secondaries to:
Read operations from secondary members of replica sets are not guaranteed to reflect the current state of the primary, and the state of secondaries will trail the primary by some amount of time. Often, applications don’t rely on this kind of strict consistency, but application developers should always consider the needs of their application before setting read preference.
For more information on read preferences or on the read preference modes, see Read Preference and Read Preference Modes.
All operations that create or modify data in the MongoDB instance are write operations. MongoDB represents data as BSON documents stored in collections. Write operations target one collection and are atomic on the level of a single document: no single write operation can atomically affect more than one document or more than one collection.
This document introduces the write operators available in MongoDB as well as presents strategies to increase the efficiency of writes in applications.
For information on write operators and how to write data to a MongoDB database, see the following pages:
For information on specific methods used to perform write operations in the mongo shell, see the following:
For information on how to perform write operations from within an application, see the Drivers documentation or the documentation for your client library.
Note
The driver write concern change created a new connection class in all of the MongoDB drivers, called MongoClient with a different default write concern. See the release notes for this change, and the release notes for the driver you’re using for more information about your driver’s release.
Clients issue write operations with some level of write concern, which describes the level of concern or guarantee the server will provide in its response to a write operation. Consider the following levels of conceptual write concern:
errors ignored: Write operations are not acknowledged by MongoDB, and may not succeed in the case of connection errors that the client is not yet aware of, or if the mongod produces an exception (e.g. a duplicate key exception for unique indexes.) While this operation is efficient because it does not require the database to respond to every write operation, it also incurs a significant risk with regards to the persistence and durability of the data.
Warning
Do not use this option in normal operation.
unacknowledged: MongoDB does not acknowledge the receipt of write operation as with a write concern level of ignore; however, the driver will receive and handle network errors, as possible given system networking configuration.
Before the releases outlined in Default Write Concern Change, this was the default write concern.
journaled: The mongod will confirm the write operation only after it has written the operation to the journal. This confirms that the write operation can survive a mongod shutdown and ensures that the write operation is durable.
While receipt acknowledged without journaled provides the fundamental basis for write concern, there is an up-to 100 millisecond window between journal commits where the write operation is not fully durable. Require journaled as part of the write concern to provide this durability guarantee.
Replica sets present an additional layer of consideration for write concern. Basic write concern level affect the write operation on only one mongod instance. The w argument to getLastError provides a replica acknowledged level of write concern. With replica acknowledged you can guarantee that the write operation has propagated to the members of a replica set. See the Write Concern for Replica Sets for more information.
Note
Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the primary of the set regardless of the level of replica acknowledged write concern.
| [1] | The default write concern is to call getLastError with no arguments. For replica sets, you can define the default write concern settings in the getLastErrorDefaults. If getLastErrorDefaults does not define a default write concern setting, getLastError defaults to basic receipt acknowledgment. |
To provide write concern, drivers issue the getLastError command after a write operation and receive a document with information about the last operation. This document’s err field contains either:
The definition of a “successful write” depends on the arguments specified to getLastError, or in replica sets, the configuration of getLastErrorDefaults. When deciding the level of write concern for your application, become familiar with the Operational Considerations and Write Concern.
The getLastError command has the following options to configure write concern requirements:
j or “journal” option
This option confirms that the mongod instance has written the data to the on-disk journal and ensures data is not lost if the mongod instance shuts down unexpectedly. Set to true to enable, as shown in the following example:
db.runCommand( { getLastError: 1, j: "true" } )
If you set journal to true, and the mongod does not have journaling enabled, as with nojournal, then getLastError will provide basic receipt acknowledgment, and will include a jnote field in its return document.
w option
This option provides the ability to disable write concern entirely as well as specifies the write concern operations for replica sets. See Operational Considerations and Write Concern for an introduction to the fundamental concepts of write concern. By default, the w option is set to 1, which provides basic receipt acknowledgment on a single mongod instance or on the primary in a replica set.
The w option takes the following values:
-1:
Disables all acknowledgment of write operations, and suppresses all including network and socket errors.
0:
Disables basic acknowledgment of write operations, but returns information about socket excepts and networking errors to the application.
Note
If you disable basic write operation acknowledgment but require journal commit acknowledgment, the journal commit prevails, and the driver will require that mongod will acknowladge the replica set.
1:
Provides acknowledgment of write operations on a standalone mongod or the primary in a replica set.
A number greater than 1:
Guarantees that write operations have propagated successfully to the specified number of replica set members including the primary. If you set w to a number that is greater than the number of set members that hold data, MongoDB waits for the non-existent members to become available, which means MongoDB blocks indefinitely.
majority:
Confirms that write operations have propagated to the majority of configured replica set: nodes must acknowledge the write operation before it succeeds. This ensures that write operation will never be subject to a rollback in the course of normal operation, and furthermore allows you to prevent hard coding assumptions about the size of your replica set into your application.
A tag set:
By specifying a tag set you can have fine-grained control over which replica set members must acknowledge a write operation to satisfy the required level of write concern.
getLastError also supports a wtimeout setting which allows clients to specify a timeout for the write concern: if you don’t specify wtimeout and the mongod cannot fulfill the write concern the getLastError will block, potentially forever.
For more information on write concern and replica sets, see Write Concern for Replica Sets for more information..
In sharded clusters, mongos instances will pass write concern on to the shard mongod instances.
In some situations you may need to insert or ingest a large amount of data into a MongoDB database. These bulk inserts have some special considerations that are different from other write operations.
The insert() method, when passed an array of documents, will perform a bulk insert, and inserts each document atomically. Drivers provide their own interface for this kind of operation.
New in version 2.2: insert() in the mongo shell gained support for bulk inserts in version 2.2.
Bulk insert can significantly increase performance by amortizing write concern costs. In the drivers, you can configure write concern for batches rather than on a per-document level.
Drivers also have a ContinueOnError option in their insert operation, so that the bulk operation will continue to insert remaining documents in a batch even if an insert fails.
Note
New in version 2.0: Support for ContinueOnError depends on version 2.0 of the core mongod and mongos components.
If the bulk insert process generates more than one error in a batch job, the client will only receive the most recent error. All bulk operations to a sharded collection run with ContinueOnError, which applications cannot disable. See Strategies for Bulk Inserts in Sharded Clusters section for more information on consideration for bulk inserts in sharded clusters.
For more information see your driver documentation for details on performing bulk inserts in your application. Also consider the following resources: Sharded Clusters, Strategies for Bulk Inserts in Sharded Clusters, and Importing and Exporting MongoDB Data.
After every insert, update, or delete operation, MongoDB must update every index associated with the collection in addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance of write operations. [2]
In general, the performance gains that indexes provide for read operations are worth the insertion penalty; however, when optimizing write performance, be careful when creating new indexes and always evaluate the indexes on the collection and ensure that your queries are actually using these indexes.
For more information on indexes in MongoDB consider Indexes and Indexing Strategies.
| [2] | The overhead for sparse indexes inserts and updates to un-indexed fields is less than for non-sparse indexes. Also for non-sparse indexes, updates that don’t change the record size have less indexing overhead. |
When a single write operation modifies multiple documents, the operation as a whole is not atomic, and other operations may interleave. The modification of a single document, or record, is always atomic, even if the write operation modifies multiple sub-document within the single record.
No other operations are atomic; however, you can attempt to isolate a write operation that affects multiple documents using the isolation operator.
To isolate a sequence of write operations from other read and write operations, see Perform Two Phase Commits.
In replica sets, all write operations go to the set’s primary, which applies the write operation then records the operations on the primary’s operation log or oplog. The oplog is a reproducible sequence of operations to the data set. Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an asynchronous process.
Large volumes of write operations, particularly bulk operations, may create situations where the secondary members have difficulty applying the replicating operations from the primary at a sufficient rate: this can cause the secondary’s state to fall behind that of the primary. Secondaries that are significantly behind the primary present problems for normal operation of the replica set, particularly failover in the form of rollbacks as well as general re2ad consistency.
To help avoid this issue, you can customize the write concern to return confirmation of the write operation to another member [3] of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the secondaries can maintain a largely current state with respect to the primary.
For more information on replica sets and write operations, see Write Concern, Oplog, Oplog Internals, and Changing Oplog Size.
| [3] | Calling getLastError intermittently with a w value of 2 or majority will slow the throughput of write traffic; however, this practice will allow the secondaries to remain current with the state of the primary. |
In a sharded cluster, MongoDB directs a given write operation to a shard and then performs the write on a particular chunk on that shard. Shards and chunks are range-based. Shard keys affect how MongoDB distributes documents among shards. Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster.
For more information, see Sharded Cluster Administration and Bulk Inserts.
MongoDB is a document-based database system, and as a result, all records, or data, in MongoDB are documents. Documents are the default representation of most user accessible data structures in the database. Documents provide structure for data in the following MongoDB contexts:
The document structure in MongoDB are BSON objects with support for the full range of BSON types; however, BSON documents are conceptually, similar to JSON objects, and have the following structure:
{
field1: value1,
field2: value2,
field3: value3,
...
fieldN: valueN
}
Having support for the full range of BSON types, MongoDB documents may contain field and value pairs where the value can be another document, an array, an array of documents as well as the basic types such as Double, String, and Date. See also BSON Type Considerations.
Consider the following document that contains values of varying types:
var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
The document contains the following fields:
All field names are strings in BSON documents. Be aware that there are some restrictions on field names for BSON documents: field names cannot contain null characters, dots (.), or dollar signs ($).
To determine the type of fields, the mongo shell provides the following operators:
Example
Consider the following operations using instanceof and typeof:
The following operation tests whether the _id field is of type ObjectId:
mydoc._id instanceof ObjectId
The operation returns true.
The following operation returns the type of the _id field:
typeof mydoc._id
In this case typeof will return the more generic object type rather than ObjectId type.
MongoDB uses the dot notation to access the elements of an array and to access the fields of a subdocument.
To access an element of an array by the zero-based index position, you concatenate the array name with the dot (.) and zero-based index position:
'<array>.<index>'
To access a field of a subdocument with dot-notation, you concatenate the subdocument name with the dot (.) and the field name:
'<subdocument>.<field>'
See also
Most documents in MongoDB in collections store data from users’ applications.
These documents have the following attributes:
The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS.
Documents have the following restrictions on field names:
Note
Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.
The following document specifies a record in a collection:
{ _id: 1, name: { first: 'John', last: 'Backus' }, birth: new Date('Dec 03, 1924'), death: new Date('Mar 17, 2007'), contribs: [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ], awards: [ { award: 'National Medal of Science', year: 1975, by: 'National Science Foundation' }, { award: 'Turing Award', year: 1977, by: 'ACM' } ] }
The document contains the following fields:
Consider the following behavior and constraints of the _id field in MongoDB documents:
Consider the following options for the value of an _id field:
Use an ObjectId. See the ObjectId documentation.
Although it is common to assign ObjectId values to _id fields, if your objects have a natural unique identifier, consider using that for the value of _id to save space and to avoid an additional index.
Generate a sequence number for the documents in your collection in your application and use this value for the _id value. See the Create an Auto-Incrementing Sequence Field tutorial for an implementation pattern.
Generate a UUID in your application code. For efficiency, store the UUID as a value of the BSON BinData type to reduce the size of UUID values as stored in the collection and in the _id index.
Use your driver’s BSON UUID facility to generate UUIDs. Be aware that driver implementations may implement UUID serialization and deserialization logic differently, which may not be fully compatible with other drivers. See your driver documentation for information concerning UUID interoperability.
Query documents specify the conditions that determine which records to select for read, update, and delete operations. You can use <field>:<value> expressions to specify the equality condition and query operator expressions to specify additional conditions.
When passed as an argument to methods such as the find() method, the remove() method, or the update() method, the query document selects documents for MongoDB to return, remove, or update, as in the following:
db.bios.find( { _id: 1 } )
db.bios.remove( { _id: { $gt: 3 } } )
db.bios.update( { _id: 1, name: { first: 'John', last: 'Backus' } },
<update>,
<options> )
See also
Update documents specify the data modifications to perform during an update() operation to modify existing records in a collection. You can use update operators to specify the exact actions to perform on the document fields.
Consider the update document example:
{
$set: { 'name.middle': 'Warner' },
$push: { awards: { award: 'IBM Fellow',
year: '1963',
by: 'IBM' }
}
}
When passed as an argument to the update() method, the update actions document:
db.bios.update(
{ _id: 1 },
{
$set: { 'name.middle': 'Warner' },
$push: { awards: {
award: 'IBM Fellow',
year: '1963',
by: 'IBM'
}
}
}
)
See also
For additional examples of updates that involve array elements, including where the elements are documents, see the $ positional operator.
Index specification documents describe the fields to index on during the index creation. See indexes for an overview of indexes. [1]
Index documents contain field and value pairs, in the following form:
{ field: value }
The following document specifies the multi-key index on the _id field and the last field contained in the subdocument name field. The document uses dot notation to access a field in a subdocument:
{ _id: 1, 'name.last': 1 }
When passed as an argument to the ensureIndex() method, the index documents specifies the index to create:
db.bios.ensureIndex( { _id: 1, 'name.last': 1 } )
| [1] | Indexes optimize a number of key read and write operations. |
Sort order documents specify the order of documents that a query() returns. Pass sort order specification documents as an argument to the sort() method. See the sort() page for more information on sorting.
The sort order documents contain field and value pairs, in the following form:
{ field: value }
The following document specifies the sort order using the fields from a sub-document name first sort by the last field ascending, then by the first field also ascending:
{ 'name.last': 1, 'name.first': 1 }
When passed as an argument to the sort() method, the sort order document sorts the results of the find() method:
db.bios.find().sort( { 'name.last': 1, 'name.first': 1 } )
The following BSON types require special consideration:
ObjectIds are: small, likely unique, fast to generate, and ordered. These values consists of 12-bytes, where the first 4-bytes is a timestamp that reflects the ObjectId’s creation. Refer to the ObjectId documentation for more information.
BSON strings are UTF-8. In general, drivers for each programming language convert from the language’s string format to UTF-8 when serializing and deserializing BSON. This makes it possible to store most international characters in BSON strings with ease. [2] In addition, MongoDB $regex queries support UTF-8 in the regex string.
| [2] | Given strings using UTF-8 character sets, using sort() on strings will be reasonably correct; however, because internally sort() uses the C++ strcmp api, the sort order may handle some characters incorrectly. |
BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date type. Timestamp values are a 64 bit value where:
Within a single mongod instance, timestamp values are always unique.
In replication, the oplog has a ts field. The values in this field reflect the operation time, which uses a BSON timestamp value.
Note
The BSON Timestamp type is for internal MongoDB use. For most cases, in application development, you will want to use the BSON date type. See Date for more information.
If you create a BSON Timestamp using the empty constructor (e.g. new Timestamp()), MongoDB will only generate a timestamp if you use the constructor in the first field of the document. [3] Otherwise, MongoDB will generate an empty timestamp value (i.e. Timestamp(0, 0).)
Changed in version 2.1: mongo shell displays the Timestamp value with the wrapper:
Timestamp(<time_t>, <ordinal>)
Prior to version 2.1, the mongo shell display the Timestamp value as a document:
{ t : <time_t>, i : <ordinal> }
| [3] | If the first field in the document is _id, then you can generate a timestamp in the second field of a document. In the following example, MongoDB will generate a Timestamp value, even though the Timestamp() constructor is not in the first field in the document: db.bios.insert( { _id: 9, last_updated: new Timestamp() } )
|
BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). The official BSON specification refers to the BSON Date type as the UTC datetime.
Changed in version 2.0: BSON Date type is signed. [4] Negative values represent dates before 1970.
Consider the following examples of BSON Date:
Construct a Date using the new Date() constructor in the mongo shell:
var mydate1 = new Date()
Construct a Date using the ISODate() constructor in the mongo shell:
var mydate2 = ISODate()
Return the Date value as string:
mydate1.toString()
Return the month portion of the Date value; months are zero-indexed, so that January is month 0:
mydate1.getMonth()
| [4] | Prior to version 2.0, Date values were incorrectly interpreted as unsigned integers, which affected sorts, range queries, and indexes on Date fields. Because indexes are not recreated when upgrading, please re-index if you created an index on Date values with an earlier version, and dates before 1970 are relevant to your application. |
These documents provide an overview and examples of CRUD operations in MongoDB.
Of the four basic database operations (i.e. CRUD), create operations are those that add new records or documents to a collection in MongoDB. For general information about write operations and the factors that affect their performance, see Write Operations; for documentation of the other CRUD operations, see the CRUD page.
You can create documents in a MongoDB collection using any of the following basic operations.
All insert operations in MongoDB exhibit the following properties:
If you attempt to insert a document without the _id field, the client library or the mongod instance will add an _id field and populate the field with a unique ObjectId.
For operations with write concern, if you specify an _id field, the _id field must be unique within the collection; otherwise the mongod will return a duplicate key exception.
The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS.
Documents have the following restrictions on field names:
Note
As of these driver versions, all write operations will issue a getLastError command to confirm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern in the Write Operations document for more information.
The insert() is the primary method to insert a document or documents into a MongoDB collection, and has the following syntax:
db.collection.insert( <document> )
Corresponding Operation in SQL
The insert() method is analogous to the INSERT statement.
Consider the following examples that illustrate the behavior of insert():
If the collection does not exist [1], then the insert() method creates the collection during the first insert. Specifically in the example, if the collection bios does not exist , then the insert operation will create this collection:
db.bios.insert(
{
_id: 1,
name: { first: 'John', last: 'Backus' },
birth: new Date('Dec 03, 1924'),
death: new Date('Mar 17, 2007'),
contribs: [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ],
awards: [
{
award: 'W.W. McDowell Award',
year: 1967,
by: 'IEEE Computer Society'
},
{
award: 'National Medal of Science',
year: 1975,
by: 'National Science Foundation'
},
{
award: 'Turing Award',
year: 1977,
by: 'ACM'
},
{
award: 'Draper Prize',
year: 1993,
by: 'National Academy of Engineering'
}
]
}
)
You can confirm the insert by querying the bios collection:
db.bios.find()
This operation returns the following document from the bios collection:
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"birth" : ISODate("1924-12-03T05:00:00Z"),
"death" : ISODate("2007-03-17T04:00:00Z"),
"contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
"awards" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
{
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering"
}
]
}
If the new document does not contain an _id field, then the insert() method adds the _id field to the document and generates a unique ObjectId for the value.
db.bios.insert(
{
name: { first: 'John', last: 'McCarthy' },
birth: new Date('Sep 04, 1927'),
death: new Date('Dec 24, 2011'),
contribs: [ 'Lisp', 'Artificial Intelligence', 'ALGOL' ],
awards: [
{
award: 'Turing Award',
year: 1971,
by: 'ACM'
},
{
award: 'Kyoto Prize',
year: 1988,
by: 'Inamori Foundation'
},
{
award: 'National Medal of Science',
year: 1990,
by: 'National Science Foundation'
}
]
}
)
You can verify the inserted document by the querying the bios collection:
db.bios.find( { name: { first: 'John', last: 'McCarthy' } } )
The returned document contains an _id field with the generated ObjectId value:
{
"_id" : ObjectId("50a1880488d113a4ae94a94a"),
"name" : { "first" : "John", "last" : "McCarthy" },
"birth" : ISODate("1927-09-04T04:00:00Z"),
"death" : ISODate("2011-12-24T05:00:00Z"),
"contribs" : [ "Lisp", "Artificial Intelligence", "ALGOL" ],
"awards" : [
{
"award" : "Turing Award",
"year" : 1971,
"by" : "ACM"
},
{
"award" : "Kyoto Prize",
"year" :1988,
"by" : "Inamori Foundation"
},
{
"award" : "National Medal of Science",
"year" : 1990,
"by" : "National Science Foundation"
}
]
}
If you pass an array of documents to the insert() method, the insert() performs a bulk insert into a collection.
The following operation inserts three documents into the bios collection. The operation also illustrates the dynamic schema characteristic of MongoDB. Although the document with _id: 3 contains a field title which does not appear in the other documents, MongoDB does not require the other documents to contain this field:
db.bios.insert(
[
{
_id: 3,
name: { first: 'Grace', last: 'Hopper' },
title: 'Rear Admiral',
birth: new Date('Dec 09, 1906'),
death: new Date('Jan 01, 1992'),
contribs: [ 'UNIVAC', 'compiler', 'FLOW-MATIC', 'COBOL' ],
awards: [
{
award: 'Computer Sciences Man of the Year',
year: 1969,
by: 'Data Processing Management Association'
},
{
award: 'Distinguished Fellow',
year: 1973,
by: ' British Computer Society'
},
{
award: 'W. W. McDowell Award',
year: 1976,
by: 'IEEE Computer Society'
},
{
award: 'National Medal of Technology',
year: 1991,
by: 'United States'
}
]
},
{
_id: 4,
name: { first: 'Kristen', last: 'Nygaard' },
birth: new Date('Aug 27, 1926'),
death: new Date('Aug 10, 2002'),
contribs: [ 'OOP', 'Simula' ],
awards: [
{
award: 'Rosing Prize',
year: 1999,
by: 'Norwegian Data Association'
},
{
award: 'Turing Award',
year: 2001,
by: 'ACM'
},
{
award: 'IEEE John von Neumann Medal',
year: 2001,
by: 'IEEE'
}
]
},
{
_id: 5,
name: { first: 'Ole-Johan', last: 'Dahl' },
birth: new Date('Oct 12, 1931'),
death: new Date('Jun 29, 2002'),
contribs: [ 'OOP', 'Simula' ],
awards: [
{
award: 'Rosing Prize',
year: 1999,
by: 'Norwegian Data Association'
},
{
award: 'Turing Award',
year: 2001,
by: 'ACM'
},
{
award: 'IEEE John von Neumann Medal',
year: 2001,
by: 'IEEE'
}
]
}
]
)
| [1] | You can also view a list of the existing collections in the database using the show collections operation in the mongo shell. |
The save() method is a specialized upsert that use the _id field in the <document> argument to determine whether to perform an insert or an update:
The save() method has the following syntax:
db.collection.save( <document> )
Consider the following examples that illustrate the use of the save() method to perform inserts:
If the <document> does not contain the _id field, the save() method performs an insert. Refer to the insert section for details of the insert operation of a document without an _id field.
The following operation performs an insert into the bios collection since the document does not contain the _id field:
db.bios.save(
{
name: { first: 'Guido', last: 'van Rossum'},
birth: new Date('Jan 31, 1956'),
contribs: [ 'Python' ],
awards: [
{
award: 'Award for the Advancement of Free Software',
year: 2001,
by: 'Free Software Foundation'
},
{
award: 'NLUUG Award',
year: 2003,
by: 'NLUUG'
}
]
}
)
If the <document> contains an _id field but has a value not found in the collection, the save() method performs an insert. Refer to the insert section for details of the insert operation.
The following operation performs an insert into the bios collection since the document contains an _id field whose value 10 is not found in the bios collection:
db.bios.save(
{
_id: 10,
name: { first: 'Yukihiro', aka: 'Matz', last: 'Matsumoto'},
birth: new Date('Apr 14, 1965'),
contribs: [ 'Ruby' ],
awards: [
{
award: 'Award for the Advancement of Free Software',
year: '2011',
by: 'Free Software Foundation'
}
]
}
)
An upsert eliminates the need to perform a separate database call to check for the existence of a record before performing either an update or an insert operation. Typically update operations update existing documents, but in MongoDB, the update() operation can accept an <upsert> option as an argument. Upserts are a hybrid operation that use the <query> argument to determine the write operation:
Consider the following syntax for an upsert operation:
db.collection.update( <query>,
<update>,
{ upsert: true } )
The following examples illustrate the use of the upsert to perform create operations:
If no document matches the <query> argument, the upsert performs an insert. If the <update> argument includes only field and value pairs, the new document contains the fields and values specified in the <update> argument. If the _id field is omitted, the operation adds the _id field and generates a unique ObjectId for its value.
The following upsert operation inserts a new document into the bios collection:
db.bios.update(
{ name: { first: 'Dennis', last: 'Ritchie'} },
{
name: { first: 'Dennis', last: 'Ritchie'},
birth: new Date('Sep 09, 1941'),
died: new Date('Oct 12, 2011'),
contribs: [ 'UNIX', 'C' ],
awards: [
{
award: 'Turing Award',
year: 1983,
by: 'ACM'
},
{
award: 'National Medal of Technology',
year: 1998,
by: 'United States'
},
{
award: 'Japan Prize',
year: 2011,
by: 'The Japan Prize Foundation'
}
]
},
{ upsert: true }
)
If no document matches the <query> argument, the upsert operation inserts a new document. If the <update> argument includes only update operators, the new document contains the fields and values from <query> argument with the operations from the <update> argument applied.
The following operation inserts a new document into the bios collection:
db.bios.update(
{
_id: 7,
name: { first: 'Ken', last: 'Thompson' }
},
{
$set: {
birth: new Date('Feb 04, 1943'),
contribs: [ 'UNIX', 'C', 'B', 'UTF-8' ],
awards: [
{
award: 'Turing Award',
year: 1983,
by: 'ACM'
},
{
award: 'IEEE Richard W. Hamming Medal',
year: 1990,
by: 'IEEE'
},
{
award: 'National Medal of Technology',
year: 1998,
by: 'United States'
},
{
award: 'Tsutomu Kanai Award',
year: 1999,
by: 'IEEE'
},
{
award: 'Japan Prize',
year: 2011,
by: 'The Japan Prize Foundation'
}
]
}
},
{ upsert: true }
)
Of the four basic database operations (i.e. CRUD), read operation are those that retrieve records or documents from a collection in MongoDB. For general information about read operations and the factors that affect their performance, see Read Operations; for documentation of the other CRUD operations, see the CRUD page.
You can retrieve documents from MongoDB using either of the following methods:
The find() method is the primary method to select documents from a collection. The find() method returns a cursor that contains a number of documents. Most drivers provide application developers with a native iterable interface for handling cursors and accessing documents. The find() method has the following syntax:
db.collection.find( <query>, <projection> )
Corresponding Operation in SQL
The find() method is analogous to the SELECT statement, while:
Consider the following examples that illustrate the use of the find() method:
The examples refer to a collection named bios that contains documents with the following prototype:
{ "_id" : 1, "name" : { "first" : "John", "last" :"Backus" }, "birth" : ISODate("1924-12-03T05:00:00Z"), "death" : ISODate("2007-03-17T04:00:00Z"), "contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ], "awards" : [ { "award" : "W.W. McDowellAward", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "National Medal of Science", "year" : 1975, "by" : "National Science Foundation" }, { "award" : "Turing Award", "year" : 1977, "by" : "ACM" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }
Note
In the mongo shell, you can format the output by adding .pretty() to the find() method call.
If there is no <query> argument, the find() method selects all documents from a collection.
The following operation returns all documents (or more precisely, a cursor to all documents) in the bios collection:
db.bios.find()
If there is a <query> argument, the find() method selects all documents from a collection that satisfies the criteria of the query:
The following operation returns all documents in the bios collection where the field _id equals 5 or ObjectId("507c35dd8fada716c89d0013"):
db.bios.find(
{
_id: { $in: [ 5, ObjectId("507c35dd8fada716c89d0013") ] }
}
)
The following operation returns all documents in the bios collection where the array field contribs contains the element 'UNIX':
db.bios.find(
{
contribs: 'UNIX'
}
)
The following operation returns all documents in the bios collection where awards array contains a subdocument element that contains the award field equal to 'Turing Award' and the year field greater than 1980:
db.bios.find(
{
awards: {
$elemMatch: {
award: 'Turing Award',
year: { $gt: 1980 }
}
}
}
)
The following operation returns all documents in the bios collection where the subdocument name contains a field first with the value 'Yukihiro' and a field last with the value 'Matsumoto'; the query uses dot notation to access fields in a subdocument:
db.bios.find(
{
'name.first': 'Yukihiro',
'name.last': 'Matsumoto'
}
)
The query matches the document where the name field contains a subdocument with the field first with the value 'Yukihiro' and a field last with the value 'Matsumoto'. For instance, the query would match documents with name fields that held either of the following values:
{
first: 'Yukihiro',
aka: 'Matz',
last: 'Matsumoto'
}
{
last: 'Matsumoto',
first: 'Yukihiro'
}
The following operation returns all documents in the bios collection where the subdocument name is exactly { first: 'Yukihiro', last: 'Matsumoto' }, including the order:
db.bios.find(
{
name: {
first: 'Yukihiro',
last: 'Matsumoto'
}
}
)
The name field must match the sub-document exactly, including order. For instance, the query would not match documents with name fields that held either of the following values:
{
first: 'Yukihiro',
aka: 'Matz',
last: 'Matsumoto'
}
{
last: 'Matsumoto',
first: 'Yukihiro'
}
The following operation returns all documents in the bios collection where either the field first in the sub-document name starts with the letter G or where the field birth is less than new Date('01/01/1945'):
db.bios.find(
{ $or: [
{ 'name.first' : /^G/ },
{ birth: { $lt: new Date('01/01/1945') } }
]
}
)
The following operation returns all documents in the bios collection where the field first in the subdocument name starts with the letter K and the array field contribs contains the element UNIX:
db.bios.find(
{
'name.first': /^K/,
contribs: 'UNIX'
}
)
In this query, the parameters (i.e. the selections of both fields) combine using an implicit logical AND for criteria on different fields contribs and name.first. For multiple AND criteria on the same field, use the $and operator.
If there is a <projection> argument, the find() method returns only those fields as specified in the <projection> argument to include or exclude:
Note
The _id field is implicitly included in the <projection> argument. In projections that explicitly include fields, _id is the only field that you can explicitly exclude. Otherwise, you cannot mix include field and exclude field specifications.
The following operation finds all documents in the bios collection and returns only the name field, the contribs field, and the _id field:
db.bios.find(
{ },
{ name: 1, contribs: 1 }
)
The following operation finds all documents in the bios collection and returns only the name field and the contribs field:
db.bios.find(
{ },
{ name: 1, contribs: 1, _id: 0 }
)
The following operation finds the documents in the bios collection where the contribs field contains the element 'OOP' and returns all fields except the _id field, the first field in the name subdocument, and the birth field from the matching documents:
db.bios.find(
{ contribs: 'OOP' },
{ _id: 0, 'name.first': 0, birth: 0 }
)
The following operation finds all documents in the bios collection and returns the the last field in the name subdocument and the first two elements in the contribs field:
db.bios.find(
{ },
{
_id: 0,
'name.last': 1,
contribs: { $slice: 2 }
}
)
See also
The find() method returns a cursor to the results; however, in the mongo shell, if the returned cursor is not assigned to a variable, then the cursor is automatically iterated up to 20 times [1] to print up to the first 20 documents that match the query, as in the following example:
db.bios.find( { _id: 1 } );
When you assign the find() to a variable:
you can type the name of the cursor variable to iterate up to 20 times [1] and print the matching documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } );
myCursor
you can use the cursor method next() to access the documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } );
var myDocument = myCursor.hasNext() ? myCursor.next() : null;
if (myDocument) {
var myName = myDocument.name;
print (tojson(myName));
}
To print, you can also use the printjson() method instead of print(tojson()):
if (myDocument) {
var myName = myDocument.name;
printjson(myName);
}
you can use the cursor method forEach() to iterate the cursor and access the documents, as in the following example:
var myCursor = db.bios.find( { _id: 1 } );
myCursor.forEach(printjson);
For more information on cursor handling, see:
| [1] | (1, 2) You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Cursor Flags and Cursor Behaviors for more information. |
In addition to the <query> and the <projection> arguments, the mongo shell and the drivers provide several cursor methods that you can call on the cursor returned by find() method to modify its behavior, such as:
sort, which orders the documents in the result set according to the field or fields specified to the method.
The following operation returns all documents (or more precisely, a cursor to all documents) in the bios collection ordered by the name field ascending:
db.bios.find().sort( { name: 1 } )
sort() corresponds to the ORDER BY statement in SQL.
The limit() method limits the number of documents in the result set.
The following operation returns at most 5 documents (or more precisely, a cursor to at most 5 documents) in the bios collection:
db.bios.find().limit( 5 )
limit() corresponds to the LIMIT statement in SQL.
The skip() method controls the starting point of the results set.
The following operation returns all documents, skipping the first 5 documents in the bios collection:
db.bios.find().skip( 5 )
You can chain these cursor methods, as in the following examples [2]:
db.bios.find().sort( { name: 1 } ).limit( 5 ) db.bios.find().limit( 5 ).sort( { name: 1 } )
See the JavaScript cursor methods reference and your driver documentation for additional references. See Cursors for more information regarding cursors.
| [2] | Regardless of the order you chain the limit() and the sort(), the request to the server has the following structure that treats the query and the sort() modifier as a single object. Therefore, the limit() operation method is always applied after the sort() regardless of the specified order of the operations in the chain. See the meta query operators for more information. |
The findOne() method selects and returns a single document from a collection and returns that document. findOne() does not return a cursor.
The findOne() method has the following syntax:
db.collection.findOne( <query>, <projection> )
Except for the return value, findOne() method is quite similar to the find() method; in fact, internally, the findOne() method is the find() method with a limit of 1.
Consider the following examples that illustrate the use of the findOne() method:
If there is no <query> argument, the findOne() method selects just one document from a collection.
The following operation returns a single document from the bios collection:
db.bios.findOne()
If there is a <query> argument, the findOne() method selects the first document from a collection that meets the <query> argument:
The following operation returns the first matching document from the bios collection where either the field first in the subdocument name starts with the letter G or where the field birth is less than new Date('01/01/1945'):
db.bios.findOne(
{
$or: [
{ 'name.first' : /^G/ },
{ birth: { $lt: new Date('01/01/1945') } }
]
}
)
You can pass a <projection> argument to findOne() to control the fields included in the result set:
The following operation finds a document in the bios collection and returns only the name field, the contribs field, and the _id field:
db.bios.findOne(
{ },
{ name: 1, contribs: 1 }
)
The following operation returns a document in the bios collection where the contribs field contains the element OOP and returns all fields except the _id field, the first field in the name subdocument, and the birth field from the matching documents:
db.bios.findOne(
{ contribs: 'OOP' },
{ _id: 0, 'name.first': 0, birth: 0 }
)
Although similar to the find() method, because the findOne() method returns a document rather than a cursor, you cannot apply the cursor methods such as limit(), sort(), and skip() to the result of the findOne() method. However, you can access the document directly, as in the following example:
var myDocument = db.bios.findOne();
if (myDocument) {
var myName = myDocument.name;
print (tojson(myName));
}
Of the four basic database operations (i.e. CRUD), update operations are those that modify existing records or documents in a MongoDB collection. For general information about write operations and the factors that affect their performance, see Write Operations; for documentation of other CRUD operations, see the CRUD page.
Update operation modifies an existing document or documents in a collection. MongoDB provides the following methods to perform update operations:
Note
Consider the following behaviors of MongoDB’s update operations.
When performing update operations that increase the document size beyond the allocated space for that document, the update operation relocates the document on disk and may reorder the document fields depending on the type of update.
As of these driver versions, all write operations will issue a getLastError command to confirm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern in the Write Operations document for more information.
The update() method is the primary method used to modify documents in a MongoDB collection. By default, the update() method updates a single document, but by using the multi option, update() can update all documents that match the query criteria in the collection. The update() method can either replace the existing document with the new document or update specific fields in the existing document.
The update() has the following syntax:
db.collection.update( <query>, <update>, <options> )
Corresponding operation in SQL
The update() method corresponds to the UPDATE operation in SQL, and:
The default behavior of the update() method updates a single document and would correspond to the SQL UPDATE statement with the LIMIT 1. With the multi option, update() method would correspond to the SQL UPDATE statement without the LIMIT clause.
Consider the following examples that illustrate the use of the update() method:
If the <update> argument contains only update operator expressions such as the $set operator expression, the update() method updates the corresponding fields in the document. To update fields in subdocuments, MongoDB uses dot notation.
The following operation queries the bios collection for the first document that has an _id field equal to 1 and sets the field named middle to the value Warner in the name subdocument and adds a new element to the awards field:
db.bios.update(
{ _id: 1 },
{
$set: { 'name.middle': 'Warner' },
$push: { awards: { award: 'IBM Fellow', year: 1963, by: 'IBM' } }
}
)
If the <update> argument contains $unset operator, the update() method removes the field from the document.
The following operation queries the bios collection for the first document that has an _id field equal to 3 and removes the birth field from the document:
db.bios.update( { _id: 3 }, { $unset: { birth: 1 } } )
If the <update> argument contains fields not currently in the document, the update() method adds the new fields to the document.
The following operation queries the bios collection for the first document that has an _id field equal to 3 and adds to the document a new mbranch field and a new aka field in the subdocument name:
db.bios.update(
{ _id: 3 },
{ $set: {
mbranch: 'Navy',
'name.aka': 'Amazing Grace'
}
}
)
If the <update> argument contains only field and value pairs, the update() method replaces the existing document with the document in the <update> argument, except for the _id field.
The following operation queries the bios collection for the first document that has a name field equal to { first: 'John', last: 'McCarthy' } and replaces all but the _id field in the document with the fields in the <update> argument:
db.bios.update(
{ name: { first: 'John', last: 'McCarthy' } },
{ name: { first: 'Ken', last: 'Iverson' },
born: new Date('Dec 17, 1941'),
died: new Date('Oct 19, 2004'),
contribs: [ 'APL', 'J' ],
awards: [
{ award: 'Turing Award',
year: 1979,
by: 'ACM' },
{ award: 'Harry H. Goode Memorial Award',
year: 1975,
by: 'IEEE Computer Society' },
{ award: 'IBM Fellow',
year: 1970,
by: 'IBM' }
]
}
)
If the update operation requires an update of an element in an array field:
The update() method can perform the update using the position of the element and dot notation. Arrays in MongoDB are zero-based.
The following operation queries the bios collection for the first document with _id field equal to 1 and updates the second element in the contribs array:
db.bios.update(
{ _id: 1 },
{ $set: { 'contribs.1': 'ALGOL 58' } }
)
The update() method can perform the update using the $ positional operator if the position is not known. The array field must appear in the query argument in order to determine which array element to update.
The following operation queries the bios collection for the first document where the _id field equals 3 and the contribs array contains an element equal to compiler. If found, the update() method updates the first matching element in the array to A compiler in the document:
db.bios.update(
{ _id: 3, 'contribs': 'compiler' },
{ $set: { 'contribs.$': 'A compiler' } }
)
The update() method can perform the update of an array that contains subdocuments by using the positional operator (i.e. $) and the dot notation.
The following operation queries the bios collection for the first document where the _id field equals 6 and the awards array contains a subdocument element with the by field equal to ACM. If found, the update() method updates the by field in the first matching subdocument:
db.bios.update(
{ _id: 6, 'awards.by': 'ACM' } ,
{ $set: { 'awards.$.by': 'Association for Computing Machinery' } }
)
If the <options> argument contains the multi option set to true or 1, the update() method updates all documents that match the query.
The following operation queries the bios collection for all documents where the awards field contains a subdocument element with the award field equal to Turing and sets the turing field to true in the matching documents:
db.bios.update(
{ 'awards.award': 'Turing' },
{ $set: { turing: true } },
{ multi: true }
)
If you set the upsert option in the <options> argument to true or 1 and no existing document match the <query> argument, the update() method can insert a new document into the collection.
The following operation queries the bios collection for a document with the _id field equal to 11 and the name field equal to { first: 'James', last: 'Gosling'}. If the query selects a document, the operation performs an update operation. If a document is not found, update() inserts a new document containing the fields and values from <query> argument with the operations from the <update> argument applied. [1]
db.bios.update(
{ _id:11, name: { first: 'James', last: 'Gosling' } },
{
$set: {
born: new Date('May 19, 1955'),
contribs: [ 'Java' ],
awards: [
{ award: 'The Economist Innovation Award',
year: 2002,
by: 'The Economist' },
{ award: 'Officer of the Order of Canada',
year: 2007,
by: 'Canada' }
]
}
},
{ upsert: true }
)
See also Create with Upsert.
| [1] | If the <update> argument includes only field and value pairs, the new document contains the fields and values specified in the <update> argument. If the <update> argument includes only update operators, the new document contains the fields and values from <query> argument with the operations from the <update> argument applied. |
The save() method updates an existing document or inserts a document depending on the _id field of the document. The save() method is analogous to the update() method with the upsert option and a <query> argument on the _id field.
The save() method has the following syntax:
db.collection.save( <document> )
Consider the following examples of the save() method:
If the <document> argument contains the _id field that exists in the collection, the save() method performs an update that replaces the existing document with the <document> argument.
The following operation queries the bios collection for a document where the _id equals ObjectId("507c4e138fada716c89d0014") and replaces the document with the <document> argument:
db.bios.save(
{
_id: ObjectId("507c4e138fada716c89d0014"),
name: { first: 'Martin', last: 'Odersky' },
contribs: [ 'Scala' ]
}
)
If no _id field exists or if the _id field exists but does not match any document in the collection, the save() method performs an insert.
The following operation adds the _id field to the document, assigns to the field a unique ObjectId, and inserts the document into the bios collection:
db.bios.save(
{
name: { first: 'Larry', last: 'Wall' },
contribs: [ 'Perl' ]
}
)
See also
Of the four basic database operations (i.e. CRUD), delete operations are those that remove documents from a collection in MongoDB.
For general information about write operations and the factors that affect their performance, see Write Operations; for documentation of other CRUD operations, see the CRUD page.
The remove() method in the mongo shell provides this operation, as do corresponding methods in the drivers.
Note
As of these driver versions, all write operations will issue a getLastError command to confirm the result of the write operation:
{ getLastError: 1 }
Refer to the documentation on write concern in the Write Operations document for more information.
Use the remove() method to delete documents from a collection; this action does not remove the indexes. [1]
The remove() method has the following syntax:
db.collection.remove( <query>, <justOne> )
Corresponding operation in SQL
The remove() method is analogous to the DELETE statement, and:
Consider the following examples that illustrate the use of the remove():
If there is a <query> argument, the remove() method deletes from the collection all documents that match the argument.
The following operation deletes all documents from the bios collection where the subdocument name contains a field first whose value starts with G:
db.bios.remove( { 'name.first' : /^G/ } )
If there is a <query> argument and you specify the <justOne> argument as true or 1, remove() only deletes a single document from the collection that matches the query.
The following operation deletes a single document from the bios collection where the turing field equals true:
db.bios.remove( { turing: true }, 1 )
If there is no <query> argument, the remove() method deletes all documents from a collection. The following operation deletes all documents from the bios collection:
db.bios.remove()
Note
This operation is not equivalent to the drop() method.
| [1] | To remove all documents from a collection, it may be faster to use the drop() method to drop the entire collection, including the indexes, and then recreate the collection and rebuild the indexes. |
You cannot apply the remove() method to a capped collection.
If the <query> argument to the remove() method matches multiple documents in the collection, the delete operation may interleave with other write operations to that collection. For an unsharded collection, you have the option to override this behavior with the $atomic isolation operator, effectively isolating the delete operation from other write operations. To isolate the operation, include $atomic: 1 in the <query> parameter as in the following example:
db.bios.remove( { turing: true, $atomic: 1 } )
In version 2.2, MongoDB introduced the aggregation framework that provides a powerful and flexible set of tools to use for many data aggregation tasks. If you’re familiar with data aggregation in SQL, consider the SQL to Aggregation Framework Mapping Chart document as an introduction to some of the basic concepts in the aggregation framework. Consider the full documentation of the aggregation framework here:
New in version 2.1.
The MongoDB aggregation framework provides a means to calculate aggregated values without having to use map-reduce. While map-reduce is powerful, it is often more difficult than necessary for many simple aggregation tasks, such as totaling or averaging field values.
If you’re familiar with SQL, the aggregation framework provides similar functionality to GROUP BY and related SQL operators as well as simple forms of “self joins.” Additionally, the aggregation framework provides projection capabilities to reshape the returned data. Using the projections in the aggregation framework, you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
See also
A presentation from MongoSV 2011: MongoDB’s New Aggregation Framework.
Additionally, consider Aggregation Framework Examples and Aggregation Framework Reference for additional documentation.
This section provides an introduction to the two concepts that underpin the aggregation framework: pipelines and expressions.
Conceptually, documents from a collection pass through an aggregation pipeline, which transforms these objects as they pass through. For those familiar with UNIX-like shells (e.g. bash,) the concept is analogous to the pipe (i.e. |) used to string text filters together.
In a shell environment the pipe redirects a stream of characters from the output of one process to the input of the next. The MongoDB aggregation pipeline streams MongoDB documents from one pipeline operator to the next to process the documents. Pipeline operators can be repeated in the pipe.
All pipeline operators process a stream of documents and the pipeline behaves as if the operation scans a collection and passes all matching documents into the “top” of the pipeline. Each operator in the pipeline transforms each document as it passes through the pipeline.
Note
Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
Warning
The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope.
Expressions produce output documents based on calculations performed on input documents. The aggregation framework defines expressions using a document format using prefixes.
Expressions are stateless and are only evaluated when seen by the aggregation process. All aggregation expressions can only operate on the current document in the pipeline, and cannot integrate data from other documents.
The accumulator expressions used in the $group operator maintain that state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline.
See also
Aggregation expressions for additional examples of the expressions provided by the aggregation framework.
Invoke an aggregation operation with the aggregate() wrapper in the mongo shell or the aggregate database command. Always call aggregate() on a collection object that determines the input documents of the aggregation pipeline. The arguments to the aggregate() method specify a sequence of pipeline operators, where each operator may have a number of operands.
First, consider a collection of documents named articles using the following format:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The following example aggregation operation pivots data to create a set of author names grouped by tags applied to an article. Call the aggregation framework by issuing the following command:
db.articles.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : { tags : "$tags" },
authors : { $addToSet : "$author" }
} }
);
The aggregation pipeline begins with the collection articles and selects the author and tags fields using the $project aggregation operator. The $unwind operator produces one output document per tag. Finally, the $group operator pivots these fields.
The aggregation operation in the previous section returns a document with two fields:
As a document, the result is subject to the BSON Document size limit, which is currently 16 megabytes.
Because you will always call aggregate on a collection object, which logically inserts the entire collection into the aggregation pipeline, you may want to optimize the operation by avoiding scanning the entire collection whenever possible.
Depending on the order in which they appear in the pipeline, aggregation operators can take advantage of indexes.
The following pipeline operators take advantage of an index when they occur at the beginning of the pipeline:
The above operators can also use an index when placed before the following aggregation operators:
If your aggregation operation requires only a subset of the data in a collection, use the $match operator to restrict which items go in to the top of the pipeline, as in a query. When placed early in a pipeline, these $match operations use suitable indexes to scan only the matching documents in a collection.
Placing a $match pipeline stage followed by a $sort stage at the start of the pipeline is logically equivalent to a single query with a sort, and can use an index.
In future versions there may be an optimization phase in the pipeline that reorders the operations to increase performance without affecting the result. However, at this time place $match operators at the beginning of the pipeline when possible.
Certain pipeline operators require access to the entire input set before they can produce any output. For example, $sort must receive all of the input from the preceding pipeline operator before it can produce its first output document. The current implementation of $sort does not go to disk in these cases: in order to sort the contents of the pipeline, the entire input must fit in memory.
$group has similar characteristics: Before any $group passes its output along the pipeline, it must receive the entirety of its input. For the $group operator, this frequently does not require as much memory as $sort, because it only needs to retain one record for each unique key in the grouping specification.
The current implementation of the aggregation framework logs a warning if a cumulative operator consumes 5% or more of the physical memory on the host. Cumulative operators produce an error if they consume 10% or more of the physical memory on the host.
Note
Changed in version 2.1.
Some aggregation operations using aggregate will cause mongos instances to require more CPU resources than in previous versions. This modified performance profile may dictate alternate architectural decisions if you use the aggregation framework extensively in a sharded environment.
The aggregation framework is compatible with sharded collections.
When operating on a sharded collection, the aggregation pipeline is split into two parts. The aggregation framework pushes all of the operators up to the first $group or $sort operation to each shard. [1] Then, a second pipeline on the mongos runs. This pipeline consists of the first $group or $sort and any remaining pipeline operators, and runs on the results received from the shards.
The $group operator brings in any “sub-totals” from the shards and combines them: in some cases these may be structures. For example, the $avg expression maintains a total and count for each shard; mongos combines these values and then divides.
| [1] | If an early $match can exclude shards through the use of the shard key in the predicate, then these operators are only pushed to the relevant shards. |
Aggregation operations with the aggregate command have the following limitations:
MongoDB provides flexible data aggregation functionality with the aggregate command. For additional information about aggregation consider the following resources:
This document provides a number of practical examples that display the capabilities of the aggregation framework. All examples use a publicly available data set of all zipcodes and populations in the United States.
mongod and mongo, version 2.2 or later.
To run you will need the zipcode data set. These data are available at: media.mongodb.org/zips.json. Use mongoimport to load this data set into your mongod instance.
Each document in this collection has the following form:
{
"_id": "10280",
"city": "NEW YORK",
"state": "NY",
"pop": 5574,
"loc": [
-74.016323,
40.710537
]
}
In these documents:
All of the following examples use the aggregate() helper in the mongo shell. aggregate() provides a wrapper around the aggregate database command. See the documentation for your driver for a more idiomatic interface for data aggregation operations.
To return all states with a population greater than 10 million, use the following aggregation operation:
db.zipcodes.aggregate( { $group :
{ _id : "$state",
totalPop : { $sum : "$pop" } } },
{ $match : {totalPop : { $gte : 10*1000*1000 } } } )
Aggregations operations using the aggregate() helper, process all documents on the zipcodes collection. aggregate() a number of pipeline operators that define the aggregation process.
In the above example, the pipeline passes all documents in the zipcodes collection through the following steps:
the $group operator collects all documents and creates documents for each state.
These new per-state documents have one field in addition the _id field: totalpop which is a generated field using the $sum operation to calculate the total value of all pop fields in the source documents.
After the $group operation the documents in the pipeline resemble the following:
{
"_id" : "AK",
"totalPop" : 550043
}
the $match operation filters these documents so that the only documents that remain are those where the value of totalpop is greater than or equal to 10 million.
The $match operation does not alter the documents, which have the same format as the documents output by $group.
The equivalent SQL for this operation is:
SELECT state, SUM(pop) AS pop
FROM zips
GROUP BY state
HAVING pop > (10*1000*1000)
To return the average populations for cities in each state, use the following aggregation operation:
db.zipcodes.aggregate( { $group :
{ _id : { state : "$state", city : "$city" },
pop : { $sum : "$pop" } } },
{ $group :
{ _id : "$_id.state",
avgCityPop : { $avg : "$pop" } } } )
Aggregations operations using the aggregate() helper, process all documents on the zipcodes collection. aggregate() a number of pipeline operators that define the aggregation process.
In the above example, the pipeline passes all documents in the zipcodes collection through the following steps:
the $group operator collects all documents and creates new documents for every combination of the city and state fields in the source document.
After this stage in the pipeline, the documents resemble the following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
the second $group operator collects documents by the state field and use the $avg expression to compute a value for the avgCityPop field.
The final output of this aggregation operation is:
{
"_id" : "MN",
"avgCityPop" : 5335
},
To return the smallest and largest cities by population for each state, use the following aggregation operation:
db.zipcodes.aggregate( { $group:
{ _id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" } } },
{ $sort: { pop: 1 } },
{ $group:
{ _id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop: { $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" } } },
// the following $project is optional, and
// modifies the output format.
{ $project:
{ _id: 0,
state: "$_id",
biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } )
Aggregations operations using the aggregate() helper, process all documents on the zipcodes collection. aggregate() a number of pipeline operators that define the aggregation process.
All documents from the zipcodes collection pass into the pipeline, which consists of the following steps:
the $group operator collects all documents and creates new documents for every combination of the city and state fields in the source documents.
By specifying the value of _id as a sub-document that contains both fields, the operation preserves the state field for use later in the pipeline. The documents produced by this stage of the pipeline have a second field, pop, which uses the $sum operator to provide the total of the pop fields in the source document.
At this stage in the pipeline, the documents resemble the following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
$sort operator orders the documents in the pipeline based on the vale of the pop field from largest to smallest. This operation does not alter the documents.
the second $group operator collects the documents in the pipeline by the state field, which is a field inside the nested _id document.
Within each per-state document this $group operator specifies four fields: Using the $last expression, the $group operator creates the biggestcity and biggestpop fields that store the city with the largest population and that population. Using the $first expression, the $group operator creates the smallestcity and smallestpop fields that store the city with the smallest population and that population.
The documents, at this stage in the pipeline resemble the following:
{
"_id" : "WA",
"biggestCity" : "SEATTLE",
"biggestPop" : 520096,
"smallestCity" : "BENGE",
"smallestPop" : 2
}
The final operation is $project, which renames the _id field to state and moves the biggestCity, biggestPop, smallestCity, and smallestPop into biggestCity and smallestCity sub-documents.
The final output of this aggregation operation is:
{
"state" : "RI",
"biggestCity" : {
"name" : "CRANSTON",
"pop" : 176404
},
"smallestCity" : {
"name" : "CLAYVILLE",
"pop" : 45
}
}
Consider a hypothetical sports club with a database that contains a user collection that tracks user’s join dates, sport preferences, and stores these data in documents that resemble the following:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
{
_id : "joe",
joined : ISODate("2012-07-02"),
likes : ["tennis", "golf", "swimming"]
}
The following operation returns user names in upper case and in alphabetical order. The aggregation includes user names for all documents in the users collection. You might do this to normalize user names for processing.
db.users.aggregate(
[
{ $project : { name:{$toUpper:"$_id"} , _id:0 } },
{ $sort : { name : 1 } }
]
)
All documents from the users collection passes through the pipeline, which consists of the following operations:
The results of the aggregation would resemble the following:
{
"name" : "JANE"
},
{
"name" : "JILL"
},
{
"name" : "JOE"
}
The following aggregation operation returns user names sorted by the month they joined. This kind of aggregation could help generate membership renewal notices.
db.users.aggregate(
[
{ $project : { month_joined : {
$month : "$joined"
},
name : "$_id",
_id : 0
},
{ $sort : { month_joined : 1 } }
]
)
The pipeline passes all documents in the users collection through the following operations:
The operation returns results that resemble the following:
{
"month_joined" : 1,
"name" : "ruth"
},
{
"month_joined" : 1,
"name" : "harold"
},
{
"month_joined" : 1,
"name" : "kate"
}
{
"month_joined" : 2,
"name" : "jill"
}
The following operation shows how many people joined each month of the year. You might use this aggregated data for such information for recruiting and marketing strategies.
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)
The pipeline passes all documents in the users collection through the following operations:
The result of this aggregation operation would resemble the following:
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
The following aggregation collects top five most “liked” activities in the data set. In this data set, you might use an analysis of this to help inform planning and future development.
db.users.aggregate(
[
{ $unwind : "$likes" },
{ $group : { _id : "$likes" , number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $limit : 5 }
]
)
The pipeline begins with all documents in the users collection, and passes these documents through the following operations:
The $unwind operator separates each value in the likes array, and creates a new version of the source document for every element in the array.
Example
Given the following document from the users collection:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
The $unwind operator would create the following documents:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : "golf"
}
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : "racquetball"
}
The $group operator collects all documents the same value for the likes field and counts each grouping. With this information, $group creates a new document with two fields:
The $sort operator sorts these documents by the number field in reverse order.
The $limit operator only includes the first 5 result documents.
The results of aggregation would resemble the following:
{
"_id" : "golf",
"number" : 33
},
{
"_id" : "racquetball",
"number" : 31
},
{
"_id" : "swimming",
"number" : 24
},
{
"_id" : "handball",
"number" : 19
},
{
"_id" : "tennis",
"number" : 18
}
New in version 2.1.0.
The aggregation framework provides the ability to project, process, and/or control the output of the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as “regular” MongoDB database queries.
These aggregation operations are all accessible by way of the aggregate() method. While all examples in this document use this method, aggregate() is merely a wrapper around the database command aggregate. The following prototype aggregation operations are equivalent:
db.people.aggregate( <pipeline> )
db.people.aggregate( [<pipeline>] )
db.runCommand( { aggregate: "people", pipeline: [<pipeline>] } )
These operations perform aggregation routines on the collection named people. <pipeline> is a placeholder for the aggregation pipeline definition. aggregate() accepts the stages of the pipeline (i.e. <pipeline>) as an array, or as arguments to the method.
This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior.
See also
Aggregation Framework overview, the Aggregation Framework Documentation Index, and the Aggregation Framework Examples for more information on the aggregation functionality.
Aggregation Operators:
Warning
The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope.
Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All examples in this section assume that the aggregation pipeline begins with a collection named article that contains documents that resemble the following:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date() ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The current pipeline operators are:
Reshapes a document stream by renaming, adding, or removing fields. Also use $project to create computed values or sub-objects. Use $project to:
Use $project to quickly select the fields that you want to include or exclude from the response. Consider the following aggregation framework operation.
db.article.aggregate(
{ $project : {
title : 1 ,
author : 1 ,
}}
);
This operation includes the title field and the author field in the document that returns from the aggregation pipeline.
Note
The _id field is always included by default. You may explicitly exclude _id as follows:
db.article.aggregate(
{ $project : {
_id : 0 ,
title : 1 ,
author : 1
}}
);
Here, the projection excludes the _id field but includes the title and author fields.
Projections can also add computed fields to the document stream passing through the pipeline. A computed field can use any of the expression operators. Consider the following example:
db.article.aggregate(
{ $project : {
title : 1,
doctoredPageViews : { $add:["$pageViews", 10] }
}}
);
Here, the field doctoredPageViews represents the value of the pageViews field after adding 10 to the original field using the $add.
Note
You must enclose the expression that defines the computed field in braces, so that the expression is a valid object.
You may also use $project to rename fields. Consider the following example:
db.article.aggregate(
{ $project : {
title : 1 ,
page_views : "$pageViews" ,
bar : "$other.foo"
}}
);
This operation renames the pageViews field to page_views, and renames the foo field in the other sub-document as the top-level field bar. The field references used for renaming fields are direct expressions and do not use an operator or surrounding braces. All aggregation field references can use dotted paths to refer to fields in nested documents.
Finally, you can use the $project to create and populate new sub-documents. Consider the following example that creates a new object-valued field named stats that holds a number of values:
db.article.aggregate(
{ $project : {
title : 1 ,
stats : {
pv : "$pageViews",
foo : "$other.foo",
dpv : { $add:["$pageViews", 10] }
}
}}
);
This projection includes the title field and places $project into “inclusive” mode. Then, it creates the stats documents with the following fields:
Provides a query-like interface to filter documents out of the aggregation pipeline. The $match drops documents that do not match the condition from the aggregation pipeline, and it passes documents that match along the pipeline unaltered.
The syntax passed to the $match is identical to the query syntax. Consider the following prototype form:
db.article.aggregate(
{ $match : <match-predicate> }
);
The following example performs a simple field equality test:
db.article.aggregate(
{ $match : { author : "dave" } }
);
This operation only returns documents where the author field holds the value dave. Consider the following example, which performs a range test:
db.article.aggregate(
{ $match : { score : { $gt : 50, $lte : 90 } } }
);
Here, all documents return when the score field holds a value that is greater than 50 and less than or equal to 90.
Note
Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of later processing. If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() or db.collection.findOne().
Warning
You cannot use $where or geospatial operations in $match queries as part of the aggregation pipeline.
Restricts the number of documents that pass through the $limit in the pipeline.
$limit takes a single numeric (positive whole number) value as a parameter. Once the specified number of documents pass through the pipeline operator, no more will. Consider the following example:
db.article.aggregate(
{ $limit : 5 }
);
This operation returns only the first 5 documents passed to it from by the pipeline. $limit has no effect on the content of the documents it passes.
Skips over the specified number of documents that pass through the $skip in the pipeline before passing all of the remaining input.
$skip takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specified number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:
db.article.aggregate(
{ $skip : 5 }
);
This operation skips the first 5 documents passed to it by the pipeline. $skip has no effect on the content of the documents it passes along the pipeline.
Peels off the elements of an array individually, and returns a stream of documents. $unwind returns one document for every member of the unwound array within every source document. Take the following aggregation command:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
Note
The dollar sign (i.e. $) must proceed the field specification handed to the $unwind operator.
In the above aggregation $project selects (inclusively) the author, title, and tags fields, as well as the _id field implicitly. Then the pipeline passes the results of the projection to the $unwind operator, which will unwind the tags field. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags field with an array of 3 items.
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
A single document becomes 3 documents: each document is identical except for the value of the tags field. Each value of tags is one of the values in the original “tags” array.
Note
$unwind has the following behaviors:
Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.
The output of $group depends on how you define groups. Begin by specifying an identifier (i.e. a _id field) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields. Aggregate keys may resemble the following document:
{ _id : { author: '$author', pageViews: '$pageViews', posted: '$posted' } }
With the exception of the _id field, $group cannot output nested documents.
Every group expression must specify an _id field. You may specify the _id field as a dotted field path reference, a document with multiple fields enclosed in braces (i.e. { and }), or a constant value.
Consider the following example:
db.article.aggregate(
{ $group : {
_id : "$author",
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
This groups by the author field and computes two fields, the first docsPerAuthor is a counter field that adds one for each document with a given author field using the $sum function. The viewsPerAuthor field is the sum of all of the pageViews fields in the documents for each group.
Each field defined for the $group must use one of the group aggregation function listed below to generate its composite value:
Returns an array of all the values found in the selected field among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.
Returns the first value it encounters for its group .
Returns the last value it encounters for its group.
Returns the highest value among all values of the field in all documents selected by this group.
Returns the lowest value among all values of the field in all documents selected by this group.
Returns the average of all the values of the field in all documents selected by this group.
Returns an array of all the values found in the selected field among the documents in that group. A value may appear more than once in the result set if more than one field in the grouped documents has that value.
Returns the sum of all the values for a specified field in the grouped documents, as in the second use above.
Alternately, if you specify a value as an argument, $sum will increment this field by the specified value for every document in the grouping. Typically, as in the first use above, specify a value of 1 in order to count members of the group.
Warning
The aggregation system currently stores $group operations in memory, which may cause problems when processing a larger number of groups.
The $sort pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:
db.<collection-name>.aggregate(
{ $sort : { <sort-key> } }
);
This sorts the documents in the collection named <collection-name>, according to the key and specification in the { <sort-key> } document.
Specify the sort in a document with a field or fields that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:
db.users.aggregate(
{ $sort : { age : -1, posts: 1 } }
);
This operation sorts the documents in the users collection, in descending order according by the age field and then in ascending order according to the value in the posts field.
Note
The $sort cannot begin sorting documents until previous operators in the pipeline have returned all output.
$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators:
Warning
Unless the $sort operator can use an index, in the current release, the sort must fit within memory. This may cause problems when sorting large numbers of documents.
These operators calculate values within the aggregation framework.
The three boolean operators accept Booleans as arguments and return Booleans as results.
Note
These operators convert non-booleans to Boolean values according to the BSON standards. Here, null, undefined, and 0 values become false, while non-zero numeric values, and all other types, such as strings, dates, objects become true.
Takes an array one or more values and returns true if all of the values in the array are true. Otherwise $and returns false.
Note
$and uses short-circuit logic: the operation stops evaluation after encountering the first false expression.
These operators perform comparisons between two values and return a Boolean, in most cases, reflecting the result of that comparison.
All comparison operators take an array with a pair of values. You may compare numbers, strings, and dates. Except for $cmp, all comparison operators return a Boolean value. $cmp returns an integer.
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array and returns a boolean. The returned value is:
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array returns an integer. The returned value is:
These operators only support numbers.
Takes an array of one or more numbers and adds them together, returning the sum.
Takes an array that contains a pair of numbers and returns the value of the first number divided by the second number.
Takes an array that contains a pair of numbers and returns the remainder of the first number divided by the second number.
See also
Takes an array of one or more numbers and multiples them, returning the resulting product.
Takes an array that contains a pair of numbers and subtracts the second from the first, returning their difference.
These operators manipulate strings within projection expressions.
Takes in two strings. Returns a number. $strcasecmp is positive if the first string is “greater than” the second and negative if the first string is “less than” the second. $strcasecmp returns 0 if the strings are identical.
Note
$strcasecmp may not make sense when applied to glyphs outside the Roman alphabet.
$strcasecmp internally capitalizes strings before comparing them to provide a case-insensitive comparison. Use $cmp for a case sensitive comparison.
$substr takes a string and two numbers. The first number represents the number of bytes in the string to skip, and the second number specifies the number of bytes to return from the string.
Note
$substr is not encoding aware and if used improperly may produce a result string containing an invalid UTF-8 character sequence.
All date operators take a “Date” typed value as a single argument and return a number.
Takes a date and returns the day of the year as a number between 1 and 366.
Takes a date and returns the day of the month as a number between 1 and 31.
Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)
Takes a date and returns the full year.
Takes a date and returns the month as a number between 1 and 12.
Takes a date and returns the week of the year as a number between 0 and 53.
Weeks begin on Sundays, and week 1 begins with the first Sunday of the year. Days preceding the first Sunday of the year are in week 0. This behavior is the same as the “%U” operator to the strftime standard library function.
Takes a date and returns the hour between 0 and 23.
Takes a date and returns the minute between 0 and 59.
Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.
Use the $cond operator with the following syntax:
{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
Takes an array with three expressions, where the first expression evaluates to a Boolean value. If the first expression evaluates to true, $cond returns the value of the second expression. If the first expression evaluates to false, $cond evaluates and returns the third expression.
Map-reduce operations can handle complex aggregation tasks. To perform map-reduce operations, MongoDB provides the mapReduce command and, in the mongo shell, the db.collection.mapReduce() wrapper method.
For many simple aggregation tasks, see the aggregation framework.
This section provides some map-reduce examples in the mongo shell using the db.collection.mapReduce() method:
db.collection.mapReduce(
<mapfunction>,
<reducefunction>,
{
out: <collection>,
query: <document>,
sort: <document>,
limit: <number>,
finalize: <function>,
scope: <document>,
jsMode: <boolean>,
verbose: <boolean>
}
)
For more information on the parameters, see the db.collection.mapReduce() reference page .
Consider the following map-reduce operations on a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 250,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Perform map-reduce operation on the orders collection to group by the cust_id, and for each cust_id, calculate the sum of the price for each cust_id:
Define the map function to process each input document:
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
Define the corresponding reduce function with two arguments keyCustId and valuesPrices:
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
This operation outputs the results to a collection named map_reduce_example. If the map_reduce_example collection already exists, the operation will replace the contents with the results of this map-reduce operation:
In this example you will perform a map-reduce operation on the orders collection, for all documents that have an ord_date value greater than 01/01/2012. The operation groups by the item.sku field, and for each sku calculates the number of orders and the total quantity ordered. The operation concludes by calculating the average quantity per order for each sku value:
Define the map function to process each input document:
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = {
count: 1,
qty: this.items[idx].qty
};
emit(key, value);
}
};
Define the corresponding reduce function with two arguments keySKU and valuesCountObjects:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
Define a finalize function with two arguments key and reducedValue. The function modifies the reducedValue object to add a computed field named average and returns the modified object:
var finalizeFunction2 = function (key, reducedValue) {
reducedValue.average = reducedValue.qty/reducedValue.count;
return reducedValue;
};
Perform the map-reduce operation on the orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date: { $gt: new Date('01/01/2012') } },
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents with ord_date greater than new Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the map_reduce_example collection already exists, the operation will merge the existing contents with the results of this map-reduce operation:
If the map-reduce dataset is constantly growing, then rather than performing the map-reduce operation over the entire dataset each time you want to run map-reduce, you may want to perform an incremental map-reduce.
To perform incremental map-reduce:
Consider the following example where you schedule a map-reduce operation on a sessions collection to run at the end of each day.
The sessions collection contains documents that log users’ session each day, for example:
db.sessions.save( { userid: "a", ts: ISODate('2011-11-03 14:17:00'), length: 95 } );
db.sessions.save( { userid: "b", ts: ISODate('2011-11-03 14:23:00'), length: 110 } );
db.sessions.save( { userid: "c", ts: ISODate('2011-11-03 15:02:00'), length: 120 } );
db.sessions.save( { userid: "d", ts: ISODate('2011-11-03 16:45:00'), length: 45 } );
db.sessions.save( { userid: "a", ts: ISODate('2011-11-04 11:05:00'), length: 105 } );
db.sessions.save( { userid: "b", ts: ISODate('2011-11-04 13:14:00'), length: 120 } );
db.sessions.save( { userid: "c", ts: ISODate('2011-11-04 17:00:00'), length: 130 } );
db.sessions.save( { userid: "d", ts: ISODate('2011-11-04 15:37:00'), length: 65 } );
Run the first map-reduce operation as follows:
Define the map function that maps the userid to an object that contains the fields userid, total_time, count, and avg_time:
var mapFunction = function() {
var key = this.userid;
var value = {
userid: this.userid,
total_time: this.length,
count: 1,
avg_time: 0
};
emit( key, value );
};
Define the corresponding reduce function with two arguments key and values to calculate the total time and the count. The key corresponds to the userid, and the values is an array whose elements corresponds to the individual objects mapped to the userid in the mapFunction.
var reduceFunction = function(key, values) {
var reducedObject = {
userid: key,
total_time: 0,
count:0,
avg_time:0
};
values.forEach( function(value) {
reducedObject.total_time += value.total_time;
reducedObject.count += value.count;
}
);
return reducedObject;
};
Define finalize function with two arguments key and reducedValue. The function modifies the reducedValue document to add another field average and returns the modified document.
var finalizeFunction = function (key, reducedValue) {
if (reducedValue.count > 0)
reducedValue.avg_time = reducedValue.total_time / reducedValue.count;
return reducedValue;
};
Perform map-reduce on the session collection using the mapFunction, the reduceFunction, and the finalizeFunction functions. Output the results to a collection session_stat. If the session_stat collection already exists, the operation will replace the contents:
db.sessions.mapReduce( mapFunction,
reduceFunction,
{
out: { reduce: "session_stat" },
finalize: finalizeFunction
}
)
Later as the sessions collection grows, you can run additional map-reduce operations. For example, add new documents to the sessions collection:
db.sessions.save( { userid: "a", ts: ISODate('2011-11-05 14:17:00'), length: 100 } );
db.sessions.save( { userid: "b", ts: ISODate('2011-11-05 14:23:00'), length: 115 } );
db.sessions.save( { userid: "c", ts: ISODate('2011-11-05 15:02:00'), length: 125 } );
db.sessions.save( { userid: "d", ts: ISODate('2011-11-05 16:45:00'), length: 55 } );
At the end of the day, perform incremental map-reduce on the sessions collection but use the query field to select only the new documents. Output the results to the collection session_stat, but reduce the contents with the results of the incremental map-reduce:
db.sessions.mapReduce( mapFunction,
reduceFunction,
{
query: { ts: { $gt: ISODate('2011-11-05 00:00:00') } },
out: { reduce: "session_stat" },
finalize: finalizeFunction
}
);
The map-reduce operation uses a temporary collection during processing. At completion, the map-reduce operation renames the temporary collection. As a result, you can perform a map-reduce operation periodically with the same target collection name without affecting the intermediate states. Use this mode when generating statistical output collections on a regular basis.
The map-reduce operation is composed of many tasks, including:
These various tasks take the following locks:
The read phase takes a read lock. It yields every 100 documents.
The JavaScript code (i.e. map, reduce, finalize functions) is executed in a single thread, taking a JavaScript lock; however, most JavaScript tasks in map-reduce are very short and yield the lock frequently.
The insert into the temporary collection takes a write lock for a single write.
If the output collection does not exist, the creation of the output collection takes a write lock.
If the output collection exists, then the output actions (i.e. merge, replace, reduce) take a write lock.
Although single-threaded, the map-reduce tasks interleave and appear to run in parallel.
Note
The final write lock during post-processing makes the results appear atomically. However, output actions merge and reduce may take minutes to process. For the merge and reduce, the nonAtomic flag is available. See the db.collection.mapReduce() reference for more information.
When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to finish.
By default the output collection is not sharded. The process is:
mongos dispatches a map-reduce finish job to the shard that will store the target collection.
The target shard pulls results from all other shards, and runs a final reduce/finalize operation, and write to the output.
If using the sharded option to the out parameter, MongoDB shards the output using _id field as the shard key.
Changed in version 2.2.
If the output collection does not exist, MongoDB creates and shards the collection on the _id field. If the collection is empty, MongoDB creates chunks using the result of the first stage of the map-reduce operation.
mongos dispatches, in parallel, a map-reduce finish job to every shard that owns a chunk.
Each shard will pull the results it owns from all other shards, run a final reduce/finalize, and write to the output collection.
Note
In MongoDB 2.0:
Warning
For best results, only use the sharded output options for mapReduce in version 2.2 or later.
You can troubleshoot the map function and the reduce function in the mongo shell.
You can verify the key and value pairs emitted by the map function by writing your own emit function.
Consider a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 250,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Define the map function that maps the price to the cust_id for each document and emits the cust_id and price pair:
var map = function() {
emit(this.cust_id, this.price);
};
Define the emit function to print the key and value:
var emit = function(key, value) {
print("emit");
print("key: " + key + " value: " + tojson(value));
}
Invoke the map function with a single document from the orders collection:
var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
map.apply(myDoc);
Verify the key and value pair is as you expected.
emit
key: abc123 value:250
Invoke the map function with multiple documents from the orders collection:
var myCursor = db.orders.find( { cust_id: "abc123" } );
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map.apply(doc);
print();
}
Verify the key and value pairs are as you expected.
You can test that the reduce function returns a value that is the same type as the value emitted from the map function.
Define a reduceFunction1 function that takes the arguments keyCustId and valuesPrices. valuesPrices is an array of integers:
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
Define a sample array of integers:
var myTestValues = [ 5, 5, 10 ];
Invoke the reduceFunction1 with myTestValues:
reduceFunction1('myKey', myTestValues);
Verify the reduceFunction1 returned an integer:
20
Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects. valuesCountObjects is an array of documents that contain two fields count and qty:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
Define a sample array of documents:
var myTestObjects = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
{ count: 3, qty: 15 }
];
Invoke the reduceFunction2 with myTestObjects:
reduceFunction2('myKey', myTestObjects);
Verify the reduceFunction2 returned a document with exactly the count and the qty field:
{ "count" : 6, "qty" : 30 }
The reduce function takes a key and a values array as its argument. You can test that the result of the reduce function does not depend on the order of the elements in the values array.
Define a sample values1 array and a sample values2 array that only differ in the order of the array elements:
var values1 = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
{ count: 3, qty: 15 }
];
var values2 = [
{ count: 3, qty: 15 },
{ count: 1, qty: 5 },
{ count: 2, qty: 10 }
];
Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects. valuesCountObjects is an array of documents that contain two fields count and qty:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
Invoke the reduceFunction2 first with values1 and then with values2:
reduceFunction2('myKey', values1);
reduceFunction2('myKey', values2);
Verify the reduceFunction2 returned the same result:
{ "count" : 6, "qty" : 30 }
Because the map-reduce operation may call a reduce multiple times for the same key, the reduce function must return a value of the same type as the value emitted from the map function. You can test that the reduce function process “reduced” values without affecting the final value.
Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects. valuesCountObjects is an array of documents that contain two fields count and qty:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
Define a sample key:
var myKey = 'myKey';
Define a sample valuesIdempotent array that contains an element that is a call to the reduceFunction2 function:
var valuesIdempotent = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
reduceFunction2(myKey, [ { count:3, qty: 15 } ] )
];
Define a sample values1 array that combines the values passed to reduceFunction2:
var values1 = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
{ count: 3, qty: 15 }
];
Invoke the reduceFunction2 first with myKey and valuesIdempotent and then with myKey and values1:
reduceFunction2(myKey, valuesIdempotent);
reduceFunction2(myKey, values1);
Verify the reduceFunction2 returned the same result:
{ "count" : 6, "qty" : 30 }
In addition to the aggregation framework, MongoDB provides simple aggregation methods and commands, that you may find useful for some classes of tasks:
In addition to the aggregation framework and map-reduce, MongoDB provides the following methods and commands to perform aggregation:
MongoDB offers the following command and methods to provide count functionality:
MongoDB offers the following command and method to provide the distinct functionality:
MongoDB offers the following command and method to provide group functionality:
Indexes provide high performance read operations for frequently used queries. Indexes are particularly useful where the total size of the documents exceeds the amount of available RAM.
For basic concepts and options, see Indexing Overview. For procedures and operational concerns, see Indexing Operations. For information on how applications might use indexes, see Indexing Strategies.
The following outlines the indexing documentation:
This document provides an overview of indexes in MongoDB, including index types and creation options. For operational guidelines and procedures, see the Indexing Operations document. For strategies and practical approaches, see the Indexing Strategies document.
An index is a data structure that allows you to quickly locate documents based on the values stored in certain specified fields. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
MongoDB indexes have the following core features:
This section enumerates the types of indexes available in MongoDB. For all collections, MongoDB creates the default _id index. You can create additional indexes with the ensureIndex() method on any single field or sequence of fields within any document or sub-document. MongoDB also supports indexes of arrays, called multi-key indexes.
The _id index is a unique index [1] on the _id field, and MongoDB creates this index by default on all collections. [2] You cannot delete the index on _id.
The _id field is the primary key for the collection, and every document must have a unique _id field. You may store any unique value in the _id field. The default value of _id is ObjectID on every insert() <db.collection.insert()` operation. An ObjectId is a 12-byte unique identifiers suitable for use as the value of an _id field.
Note
In sharded clusters, if you do not use the _id field as the shard key, then your application must ensure the uniqueness of the values in the _id field to prevent errors. This is most-often done by using a standard auto-generated ObjectId.
| [1] | Although the index on _id is unique, the getIndexes() method will not print unique: true in the mongo shell. |
| [2] | Before version 2.2 capped collections did not have an _id field. In 2.2, all capped collections have an _id field, except those in the local database. See the release notes for more information. |
All indexes in MongoDB are secondary indexes. You can create indexes on any field within any document or sub-document. Additionally, you can create compound indexes with multiple fields, so that a single query can match multiple components using the index while scanning fewer whole documents.
In general, you should create indexes that support your primary, common, and user-facing queries. Doing so requires MongoDB to scan the fewest number of documents possible.
In the mongo shell, you can create an index by calling the ensureIndex() method. Arguments to ensureIndex() resemble the following:
{ "field": 1 }
{ "product.quantity": 1 }
{ "product": 1, "quantity": 1 }
For each field in the index specify either 1 for an ascending order or -1 for a descending order, which represents the order of the keys in the index. For indexes with more than one key (i.e. compound indexes) the sequence of fields is important.
You can create indexes on fields that hold sub-documents as in the following example:
Example
Given the following document in the factories collection:
{ "_id": ObjectId(...), metro: { city: "New York", state: "NY" } } )
You can create an index on the metro key. The following queries would then use that index, and both would return the above document:
db.factories.find( { metro: { city: "New York", state: "NY" } } );
db.factories.find( { metro: { $gte : { city: "New York" } } } );
The second query returns the document because { city: "New York" } is less than { city: "New York", state: "NY" } The order of comparison is in ascending key order in the order the keys occur in the BSON document.
You can create indexes on fields in sub-documents, just as you can index top-level fields in documents. [3] These indexes allow you to use a “dot notation,” to introspect into sub-documents.
Consider a collection named people that holds documents that resemble the following example document:
{"_id": ObjectId(...)
"name": "John Doe"
"address": {
"street": "Main"
"zipcode": 53511
"state": "WI"
}
}
You can create an index on the address.zipcode field, using the following specification:
db.people.ensureIndex( { "address.zipcode": 1 } )
| [3] | Indexes on Sub-documents, by contrast allow you to index fields that hold documents, including the full content, up to the maximum Index Size of the sub-document in the index. |
MongoDB supports “compound indexes,” where a single index structure holds references to multiple fields within a collection’s documents. Consider a collection named products that holds documents that resemble the following document:
{
"_id": ObjectId(...)
"item": "Banana"
"category": ["food", "produce", "grocery"]
"location": "4th Street Store"
"stock": 4
"type": cases
"arrival": Date(...)
}
If most applications queries include the item field and a significant number of queries will also check the stock field, you can specify a single compound index to support both of these queries:
db.products.ensureIndex( { "item": 1, "location": 1, "stock": 1 } )
Compound indexes support queries on any prefix of the fields in the index. [4] For example, MongoDB can use the above index to support queries that select the item field and to support queries that select the item field and the location field. The index, however, would not support queries that select the following:
When creating an index, the number associated with a key specifies the direction of the index. The options are 1 (ascending) and -1 (descending). Direction doesn’t matter for single key indexes or for random access retrieval but is important if you are doing sort queries on compound indexes.
The order of fields in a compound index is very important. In the previous example, the index will contain references to documents sorted first by the values of the item field and, within each value of the item field, sorted by the values of location, and then sorted by values of the stock field.
| [4] | Index prefixes are the beginning subset of fields. For example, given the index { a: 1, b: 1, c: 1 } both { a: 1 } and { a: 1, b: 1 } are prefixes of the index. |
Indexes store references to fields in either ascending or descending order. For single-field indexes, the order of keys doesn’t matter, because MongoDB can traverse the index in either direction. However, for compound indexes, if you need to order results against two fields, sometimes you need the index fields running in opposite order relative to each other.
To specify an index with a descending order, use the following form:
db.products.ensureIndex( { "field": -1 } )
More typically in the context of a compound index, the specification would resemble the following prototype:
db.products.ensureIndex( { "fieldA": 1, "fieldB": -1 } )
Consider a collection of event data that includes both usernames and a timestamp. If you want to return a list of events sorted by username and then with the most recent events first. To create this index, use the following command:
db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )
If you index a field that contains an array, MongoDB indexes each value in the array separately, in a “multikey index.”
Example
Given the following document:
{ "_id" : ObjectId("..."),
"name" : "Warm Weather",
"author" : "Steve",
"tags" : [ "weather", "hot", "record", "april" ] }
Then an index on the tags field would be a multikey index and would include these separate entries:
{ tags: "weather" }
{ tags: "hot" }
{ tags: "record" }
{ tags: "april" }
Queries could use the multikey index to return queries for any of the above values.
You can use multikey indexes to index fields within objects embedded in arrays, as in the following example:
Example
Consider a feedback collection with documents in the following form:
{
"_id": ObjectId(...)
"title": "Grocery Quality"
"comments": [
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
]
}
An index on the comments.text field would be a multikey index and would add items to the index for all of the sub-documents in the array.
With an index, such as { comments.text: 1 } you, consider the following query:
db.feedback.find( { "comments.text": "Please expand the olive selection." } )
This would select the document, that contains the following document in the comments.text array:
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
Compound Multikey Indexes May Only Include One Array Field
While you can create multikey compound indexes, at most one field in a compound index may hold an array. For example, given an index on { a: 1, b: 1 }, the following documents are permissible:
{a: [1, 2], b: 1}
{a: 1, b: [1, 2]}
However, the following document is impermissible, and MongoDB cannot insert such a document into a collection with the {a: 1, b: 1 } index:
{a: [1, 2], b: [1, 2]}
If you attempt to insert a such a document, MongoDB will reject the insertion, and produce an error that says cannot index parallel arrays. MongoDB does not index parallel arrays because they require the index to include each value in the Cartesian product of the compound keys, which could quickly result in incredibly large and difficult to maintain indexes.
A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed field. To create a unique index on the user_id field of the members collection, use the following operation in the mongo shell:
db.addresses.ensureIndex( { "user_id": 1 }, { unique: true } )
By default, unique is false on MongoDB indexes.
If you use the unique constraint on a compound index then MongoDB will enforce uniqueness on the combination of values, rather than the individual value for any or all values of the key.
If a document does not have a value for the indexed field in a unique index, the index will store a null value for this document. MongoDB will only permit one document without a unique value in the collection because of this unique constraint. You can combine with the sparse index to filter these null values from the unique index.
Sparse indexes only contain entries for documents that have the indexed field. [5] Any document that is missing the field is not indexed. The index is “sparse” because of the missing documents when values are missing.
By contrast, non-sparse indexes contain all documents in a collection, and store null values for documents that do not contain the indexed field. Create a sparse index on the xmpp_id field, of the members collection, using the following operation in the mongo shell:
db.addresses.ensureIndex( { "xmpp_id": 1 }, { sparse: true } )
By default, sparse is false on MongoDB indexes.
Warning
Using these indexes will sometimes result in incomplete results when filtering or sorting results, because sparse indexes are not complete for all documents in a collection.
Note
Do not confuse sparse indexes in MongoDB with block-level indexes in other databases. Think of them as dense indexes with a specific filter.
You can combine the sparse index option with the unique indexes option so that mongod will reject documents that have duplicate values for a field, but that ignore documents that do not have the key.
| [5] | All documents that have the indexed field are indexed in a sparse index, even if that field stores a null value in some documents. |
You specify index creation options in the second argument in ensureIndex().
The options sparse, unique, and TTL affect the kind of index that MongoDB creates. This section addresses, background construction and duplicate dropping, which affect how MongoDB builds the indexes.
By default, creating an index is a blocking operation. Building an index on a large collection of data can take a long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod instance during the index build.
For example, to create an index in the background of the zipcode field of the people collection you would issue the following:
db.people.ensureIndex( { zipcode: 1}, {background: true} )
By default, background is false for building MongoDB indexes.
You can combine the background option with other options, as in the following:
db.people.ensureIndex( { zipcode: 1}, {background: true, sparse: true } )
Be aware of the following behaviors with background index construction:
A mongod instance can only build one background index per database, at a time.
Changed in version 2.2: Before 2.2, a single mongod instance could only build one index at a time.
The indexing operation runs in the background so that other database operations can run while creating the index. However, the mongo shell session or connection where you are creating the index will block until the index build is complete. Open another connection or mongo instance to continue using commands to the database.
The background index operation use an incremental approach that is slower than the normal “foreground” index builds. If the index is larger than the available RAM, then the incremental process can take much longer than the foreground build.
If your application includes ensureIndex() operations, and an index doesn’t exist for other operational concerns, building the index can have a severe impact on the performance of the database.
Make sure that your application checks for the indexes at start up using the getIndexes() method or the equivalent method for your driver and terminates if the proper indexes do not exist. Always build indexes in production instances using separate application code, during designated maintenance windows.
Building Indexes on Secondaries
Background index operations on a replica set primary become foreground indexing operations on secondary members of the set. All indexing operations on secondaries block replication.
To build large indexes on secondaries the best approach is to restart one secondary at a time in standalone mode and build the index. After building the index, restart as a member of the replica set, allow it to catch up with the other members of the set, and then build the index on the next secondary. When all the secondaries have the new index, step down the primary, restart it as a standalone, and build the index on the former primary.
Remember, the amount of time required to build the index on a secondary node must be within the window of the oplog, so that the secondary can catch up with the primary.
See Build Indexes on Replica Sets for more information on this process.
Indexes on secondary members in “recovering” mode are always built in the foreground to allow them to catch up as soon as possible.
See Build Indexes on Replica Sets for a complete procedure for rebuilding indexes on secondaries.
Note
If MongoDB is building an index in the background, you cannot perform other administrative operations involving that collection, including repairDatabase, drop that collection (i.e. db.collection.drop(),) and compact. These operations will return an error during background index builds.
Queries will not use these indexes until the index build is complete.
MongoDB cannot create a unique index on a field that has duplicate values. To force the creation of a unique index, you can specify the dropDups option, which will only index the first occurrence of a value for the key, and delete all subsequent values.
Warning
As in all unique indexes, if a document does not have the indexed field, MongoDB will include it in the index with a “null” value.
If subsequent fields do not have the indexed field, and you have set {dropDups: true}, MongoDB will remove these documents from the collection when creating the index. If you combine dropDups with the sparse option, this index will only include documents in the index that have the value, and the documents without the field will remain in the database.
To create a unique index that drops duplicates on the username field of the accounts collection, use a command in the following form:
db.accounts.ensureIndex( { username: 1 }, { unique: true, dropDups: true } )
Warning
Specifying { dropDups: true } will delete data from your database. Use with extreme caution.
By default, dropDups is false.
TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time. This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited amount of time.
These indexes have the following limitations:
Note
TTL indexes expire data by removing documents in a background task that runs once a minute. As a result, the TTL index provides no guarantees that expired documents will not exist in the collection. Consider that:
In all other respects, TTL indexes are normal indexes, and if appropriate, MongoDB can use these indexes to fulfill arbitrary queries.
MongoDB provides “geospatial indexes” to support location-based and other similar queries in a two dimensional coordinate systems. For example, use geospatial indexes when you need to take a collection of documents that have coordinates, and return a number of options that are “near” a given coordinate pair.
To create a geospatial index, your documents must have a coordinate pair. For maximum compatibility, these coordinate pairs should be in the form of a two element array, such as [ x , y ]. Given the field of loc, that held a coordinate pair, in the collection places, you would create a geospatial index as follows:
db.places.ensureIndex( { loc : "2d" } )
MongoDB will reject documents that have values in the loc field beyond the minimum and maximum values.
Note
MongoDB permits only one geospatial index per collection. Although, MongoDB will allow clients to create multiple geospatial indexes, a single query can use only one index.
See the $near, and the database command geoNear for more information on accessing geospatial data.
In addition to conventional geospatial indexes, MongoDB also provides a bucket-based geospatial index, called “geospatial haystack indexes.” These indexes support high performance queries for locations within a small area, when the query must filter along another dimension.
Example
If you need to return all documents that have coordinates within 25 miles of a given point and have a type field value of “museum,” a haystack index would be provide the best support for these queries.
Haystack indexes allow you to tune your bucket size to the distribution of your data, so that in general you search only very small regions of 2d space for a particular kind of document. These indexes are not suited for finding the closest documents to a particular location, when the closest documents are far away compared to bucket size.
Be aware of the following behaviors and limitations:
A collection may have no more than 64 indexes.
Index keys can be no larger than 1024 bytes. This includes the field value or values, the field name or names, and the namespace.
Documents with fields that have values greater than this size cannot be indexed.
To query for documents that were too large to index, you can use a command similar to the following:
db.myCollection.find({<key>: <value too large to index>}).hint({$natural: 1})
The name of an index, including the namespace must be shorter than 128 characters.
Indexes have storage requirements, and impacts insert/update speed to some degree.
Create indexes to support queries and other operations, but do not maintain indexes that your MongoDB instance cannot or will not use.
This document provides operational guidelines and procedures for indexing data in MongoDB collections. For the fundamentals of MongoDB indexing, see the Indexing Overview document. For strategies and practical approaches, see the Indexing Strategies document.
Indexes allow MongoDB to process and fulfill queries quickly by creating small and efficient representations of the documents in a collection.
To create an index, use db.collection.ensureIndex() or a similar method from your driver. For example the following creates an index on the phone-number field of the people collection:
db.people.ensureIndex( { "phone-number": 1 } )
ensureIndex() only creates an index if an index of the same specification does not already exist.
All indexes support and optimize the performance for queries that select on this field. For queries that cannot use an index, MongoDB must scan all documents in a collection for documents that match the query.
Example
If you create an index on the user_id field in the records, this index is, the index will support the following query:
db.records.find( { user_id: 2 } )
However, the following query, on the profile_url field is not supported by this index:
db.records.find( { profile_url: 2 } )
If your collection holds a large amount of data, consider building the index in the background, as described in Background Construction. To build indexes on replica sets, see the Build Indexes on Replica Sets section for more information.
To create a compound index use an operation that resembles the following prototype:
db.collection.ensureIndex( { a: 1, b: 1, c: 1 } )
For example, the following operation will create an index on the item, category, and price fields of the products collection:
db.products.ensureIndex( { item: 1, category: 1, price: 1 } )
Some drivers may specify indexes, using NumberLong(1) rather than 1 as the specification. This does not have any affect on the resulting index.
Note
To build or rebuild indexes for a replica set see Build Indexes on Replica Sets.
If your collection is large, build the index in the background, as described in Background Construction. If you build in the background on a live replica set, see also Build Indexes on Replica Sets.
Note
TTL collections use a special expire index option. See Expire Data from Collections by Setting TTL for more information.
To create a sparse index on a field, use an operation that resembles the following prototype:
db.collection.ensureIndex( { a: 1 }, { sparse: true } )
The following example creates a sparse index on the users table that only indexes the twitter_name if a document has this field. This index will not include documents in this collection without the twitter_name field.
db.users.ensureIndex( { twitter_name: 1 }, { sparse: true } )
Note
Sparse indexes can affect the results returned by the query, particularly with respect to sorts on fields not included in the index. See the sparse index section for more information.
To create a unique indexes, consider the following prototype:
db.collection.ensureIndex( { a: 1 }, { unique: true } )
For example, you may want to create a unique index on the "tax-id": of the accounts collection to prevent storing multiple account records for the same legal entity:
db.accounts.ensureIndex( { "tax-id": 1 }, { unique: true } )
The _id index is a unique index. In some situations you may consider using _id field itself for this kind of data rather than using a unique index on another field.
In many situations you will want to combine the unique constraint with the sparse option. When MongoDB indexes a field, if a document does not have a value for a field, the index entry for that item will be null. Since unique indexes cannot have duplicate values for a field, without the sparse option, MongoDB will reject the second document and all subsequent documents without the indexed field. Consider the following prototype.
db.collection.ensureIndex( { a: 1 }, { unique: true, sparse: true } )
You can also enforce a unique constraint on compound indexes, as in the following prototype:
db.collection.ensureIndex( { a: 1, b: 1 }, { unique: true } )
These indexes enforce uniqueness for the combination of index keys and not for either key individually.
To create an index in the background you can specify background construction. Consider the following prototype invocation of db.collection.ensureIndex():
db.collection.ensureIndex( { a: 1 }, { background: true } )
Consider the section on background index construction for more information about these indexes and their implications.
To force the creation of a unique index index on a collection with duplicate values in the field you are indexing you can use the dropDups option. This will force MongoDB to create a unique index by deleting documents with duplicate values when building the index. Consider the following prototype invocation of db.collection.ensureIndex():
db.collection.ensureIndex( { a: 1 }, { dropDups: true } )
See the full documentation of duplicate dropping for more information.
Warning
Specifying { dropDups: true } may delete data from your database. Use with extreme caution.
Refer to the ensureIndex() documentation for additional index creation options.
To return a list of all indexes on a collection, use the, use the db.collection.getIndexes() method or a similar method for your driver.
For example, to view all indexes on the people collection:
db.people.getIndexes()
To return a list of all indexes on all collections in a database, use the following operation in the mongo shell:
db.system.indexes.find()
Query performance is a good general indicator of index use; however, for more precise insight into index use, MongoDB provides the following tools:
Append the explain() method to any cursor (e.g. query) to return a document with statistics about the query process, including the index used, the number of documents scanned, and the time the query takes to process in milliseconds.
Append the hint() to any cursor (e.g. query) with the index as the argument to force MongoDB to use a specific index to fulfill the query. Consider the following example:
db.people.find( { name: "John Doe", zipcode: { $gt: 63000 } } } ).hint( { zipcode: 1 } )
You can use hint() and explain() in conjunction with each other to compare the effectiveness of a specific index. Specify the $natural operator to the hint() method to prevent MongoDB from using any index:
db.people.find( { name: "John Doe", zipcode: { $gt: 63000 } } } ).hint( { $natural: 1 } )
Use the indexCounters data in the output of serverStatus for insight into database-wise index utilization.
To remove an index, use the db.collection.dropIndex() method, as in the following example:
db.accounts.dropIndex( { "tax-id": 1 } )
This will remove the index on the "tax-id" field in the accounts collection. The shell provides the following document after completing the operation:
{ "nIndexesWas" : 3, "ok" : 1 }
Where the value of nIndexesWas reflects the number of indexes before removing this index. You can also use the db.collection.dropIndexes() to remove all indexes, except for the _id index from a collection.
These shell helpers provide wrappers around the dropIndexes database command. Your client library may have a different or additional interface for these operations.
If you need to rebuild indexes for a collection you can use the db.collection.reIndex() method. This will drop all indexes, including the _id index, and then rebuild all indexes. The operation takes the following form:
db.accounts.reIndex()
MongoDB will return the following document when the operation completes:
{
"nIndexesWas" : 2,
"msg" : "indexes dropped for collection",
"nIndexes" : 2,
"indexes" : [
{
"key" : {
"_id" : 1,
"tax-id" : 1
},
"ns" : "records.accounts",
"name" : "_id_"
}
],
"ok" : 1
}
This shell helper provides a wrapper around the reIndex database command. Your client library may have a different or additional interface for this operation.
Note
To build or rebuild indexes for a replica set see Build Indexes on Replica Sets.
Background index creation operations become foreground indexing operations on secondary members of replica sets. The foreground index building process blocks all replication and read operations on the secondaries while they build the index.
Secondaries will begin building indexes after the primary finishes building the index. In sharded clusters, the mongos will send ensureIndex() to the primary members of the replica set for each shard, which then replicate to the secondaries after the primary finishes building the index.
To minimize the impact of building an index on your replica set, use the following procedure to build indexes on secondaries:
Note
If you need to build an index in a sharded cluster, repeat the following procedure for each replica set that provides each shard.
Warning
Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up. See the “oplog sizing” documentation for additional information.
Note
This procedure does take one member out of the replica set at a time. However, this procedure will only affect one member of the set at a time rather than all secondaries at the same time.
| [1] | By running the mongod on a different port, you ensure that the other members of the replica set and all clients will not contact the member while you are building the index. |
To see the status of the indexing processes, you can use the db.currentOp() method in the mongo shell. The value of the query field and the msg field will indicate if the operation is an index build. The msg field also indicates the percent of the build that is complete.
You can only terminate a background index build. If you need to terminate an ongoing index build, You can use the db.killOp() method in the mongo shell.
This document provides strategies for indexing in MongoDB. For fundamentals of MongoDB indexing, see Indexing Overview. For operational guidelines and procedures, see Indexing Operations.
The best indexes for your application are based on a number of factors, including the kinds of queries you expect, the ratio of reads to writes, and the amount of free memory on your system.
When developing your indexing strategy you should have a deep understanding of:
The best overall strategy for designing indexes is to profile a variety of index configurations with data sets similar to the ones you’ll be running in production to see which configurations perform best.
MongoDB can only use one index to support any given operation. However, each clause of an $or query can use its own index.
If you only ever query on a single key in a given collection, then you need to create just one single-key index for that collection. For example, you might create an index on category in the product collection:
db.products.ensureIndex( { "category": 1 } )
However, if you sometimes query on only one key and at other times query on that key combined with a second key, then creating a compound index is more efficient. MongoDB will use the compound index for both queries. For example, you might create an index on both category and item.
db.products.ensureIndex( { "category": 1, "item": 1 } )
This allows you both options. You can query on just category, and you also can query on category combined with item. (To query on multiple keys and sort the results, see Use Indexes to Sort Query Results.)
With the exception of queries that use the $or operator, a query cannot use multiple indexes. A query must use only one index.
A single compound index on multiple fields can support all the queries that search a “prefix” subset of those fields.
Example
The following index on a collection:
{ x: 1, y: 1, z: 1 }
Can support queries that the following indexes support:
{ x: 1 }
{ x: 1, y: 1 }
There are some situations where the prefix indexes may offer better query performance: for example if z is a large array.
The { x: 1, y: 1, z: 1 } index can also support many of the same queries as the following index:
{ x: 1, z: 1 }
Also, { x: 1, z: 1 } has an additional use. Given the following query:
db.collection.find( { x: 5 } ).sort( { z: 1} )
The { x: 1, z: 1 } index supports both the query and the sort operation, while the { x: 1, y: 1, z: 1 } index only supports the query. For more information on sorting, see Use Indexes to Sort Query Results.
A covered index query is a query in which all the queried fields are part of an index. They are “covered queries” because an index “covers” the query. MongoDB can fulfill the query by using only the index. MongoDB need not scan documents from the database.
Querying only the index is much faster than querying documents. Index keys are typically smaller than the documents they catalog, and indexes are typically stored in RAM or located sequentially on disk.
Mongod automatically uses a covered query when possible. To ensure use of a covered query, create an index that includes all the fields listed in the query result. This means that the projection document given to a query (to specify which fields MongoDB returns from the result set) must explicitly exclude the _id field from the result set, unless the index includes _id.
MongoDB cannot use a covered query if any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index index and cannot support a covered query.
To test whether MongoDB used a covered query, use explain(). If the output displays true for the indexOnly field, MongoDB used a covered query. For more information see Measure Index Use.
For the fastest performance when sorting query results by a given field, create a sorted index on that field.
To sort query results on multiple fields, create a compound index. MongoDB sorts results based on the field order in the index. For queries that include a sort that uses a compound index, ensure that all fields before the first sorted field are equality matches.
Example
If you create the following index:
{ a: 1, b: 1, c: 1, d: 1 }
The following query and sort operations can use the index:
db.collection.find().sort( { a:1 } )
db.collection.find().sort( { a:1, b:1 } )
db.collection.find( { a:4 } ).sort( { a:1, b:1 } )
db.collection.find( { b:5 } ).sort( { a:1, b:1 } )
db.collection.find( { a:5 } ).sort( { b:1, c:1 } )
db.collection.find( { a:5, c:4, b:3 } ).sort( { d:1 } )
db.collection.find( { a: { $gt:4 } } ).sort( { a:1, b:1 } )
db.collection.find( { a: { $gt:5 } } ).sort( { a:1, b:1 } )
db.collection.find( { a:5, b:3, d:{ $gt:4 } } ).sort( { c:1 } )
db.collection.find( { a:5, b:3, c:{ $lt:2 }, d:{ $gt:4 } } ).sort( { c:1 } )
However, the following queries cannot sort the results using the index:
db.collection.find().sort( { b:1 } )
db.collection.find( { b:5 } ).sort( { b:1 } )
For fastest processing, ensure that your indexes fit entirely in RAM so that the system can avoid reading the index from disk.
To check the size of your indexes, use the db.collection.totalIndexSize() helper, which returns data in bytes:
> db.collection.totalIndexSize()
4294976499
The above example shows an index size of almost 4.3 gigabytes. To ensure this index fits in RAM, you must not only have more than that much RAM available but also must have RAM available for the rest of the working set. Also remember:
If you have and use multiple collections, you must consider the size of all indexes on all collections. The indexes and the working set must be able to fit in RAM at the same time.
There are some limited cases where indexes do not need to fit in RAM. See Indexes that Hold Only Recent Values in RAM.
See also
For additional collection statistics, use collStats or db.collection.stats().
Indexes do not have to fit entirely into RAM in all cases. If the value of the indexed field grows with every insert, and most queries select recently added documents; then MongoDB only needs to keep the parts of the index that hold the most recent or “right-most” values in RAM. This allows for efficient index use for read and write operations and minimize the amount of RAM required to support the index.
Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
To ensure selectivity, write queries that limit the number of possible documents with the indexed field. Write queries that are appropriately selective relative to your indexed data.
Example
Suppose you have a field called status where the possible values are new and processed. If you add an index on status you’ve created a low-selectivity index. The index will be of little help in locating records.
A better strategy, depending on your queries, would be to create a compound index that includes the low-selectivity field and another field. For example, you could create a compound index on status and created_at.
Another option, again depending on your use case, might be to use separate collections, one for each status.
Example
Consider an index { a : 1 } (i.e. an index on the key a sorted in ascending order) on a collection where a has three values evenly distributed across the collection:
{ _id: ObjectId(), a: 1, b: "ab" }
{ _id: ObjectId(), a: 1, b: "cd" }
{ _id: ObjectId(), a: 1, b: "ef" }
{ _id: ObjectId(), a: 2, b: "jk" }
{ _id: ObjectId(), a: 2, b: "lm" }
{ _id: ObjectId(), a: 2, b: "no" }
{ _id: ObjectId(), a: 3, b: "pq" }
{ _id: ObjectId(), a: 3, b: "rs" }
{ _id: ObjectId(), a: 3, b: "tv" }
If you query for { a: 2, b: "no" } MongoDB must scan 3 documents in the collection to return the one matching result. Similarly, a query for { a: { $gt: 1}, b: "tv" } must scan 6 documents, also to return one result.
Consider the same index on a collection where a has nine values evenly distributed across the collection:
{ _id: ObjectId(), a: 1, b: "ab" }
{ _id: ObjectId(), a: 2, b: "cd" }
{ _id: ObjectId(), a: 3, b: "ef" }
{ _id: ObjectId(), a: 4, b: "jk" }
{ _id: ObjectId(), a: 5, b: "lm" }
{ _id: ObjectId(), a: 6, b: "no" }
{ _id: ObjectId(), a: 7, b: "pq" }
{ _id: ObjectId(), a: 8, b: "rs" }
{ _id: ObjectId(), a: 9, b: "tv" }
If you query for { a: 2, b: "cd" }, MongoDB must scan only one document to fulfill the query. The index and query are more selective because the values of a are evenly distributed and the query can select a specific document using the index.
However, although the index on a is more selective, a query such as { a: { $gt: 5 }, b: "tv" } would still need to scan 4 documents.
If overall selectivity is low, and if MongoDB must read a number of documents to return results, then some queries may perform faster without indexes. To determine performance, see Measure Index Use.
If your application is write-heavy, then be careful when creating new indexes, since each additional index with impose a write-performance penalty. In general, don’t be careless about adding indexes. Add indexes to complement your queries. Always have a good reason for adding a new index, and be sure to benchmark alternative strategies.
MongoDB must update all indexes associated with a collection after every insert, update, or delete operation. For update operations, if the updated document does not move to a new location, then MongoDB only modifies the updated fields in the index. Therefore, every index on a collection adds some amount of overhead to these write operations. In almost every case, the performance gains that indexes realize for read operations are worth the insertion penalty. However, in some cases:
MongoDB provides support for querying location-based data using special geospatial indexes. For an introduction to these 2d indexes, see 2d Geospatial Indexes.
MongoDB supports the following geospatial query types:
Proximity queries select the documents closest to the point specified in the query. To perform proximity queries you use either the find() method with the $near operator or you use the geoNear command.
The find() method with the $near operator returns 100 documents by default and sorts the results by distance. The $near operator uses the following form:
db.collection.find( { <location field>: { $near: [ x, y ] } } )
Example
The following query
db.places.find( { loc: {$near: [-70,40] } })
returns output similar to the following:
{ "_id" : ObjectId(" ... "), "loc" : [ -73, 40 ] }
The geoNear command returns more information than does the $near operator but does not sort results. The geoNear command also offers additional operators, such as operators to query for maximum or spherical distance. For a list of operators, see geoNear.
Without additional operators, the geoNear command uses the following form:
db.runCommand( {geoNear: "[collection]", near: [ x, y ] } )
Example
The following command returns the same results as the near in the previous example but with more information:
db.runCommand( {geoNear: "places", near: [ -74, 40.74 ] } )
This operation will return the following output:
{ "ns" : "test.places", "near" : "0110000111111000000111111000000111111000000111111000", "results" : [ { "dis" : 3, "obj" : { "_id" : ObjectId(" ... "), "loc" : [ -73, 40 ] } } ], "stats" : { "time" : 2, "btreelocs" : 0, "nscanned" : 1, "objectsLoaded" : 1, "avgDistance" : 3, "maxDistance" : 3.0000188685220253 }, "ok" : 1 }
You can limit a proximity query to those documents that fall within a maximum distance of a point. You specify the maximum distance using the units specified by the coordinate system. For example, if the coordinate system uses meters, you specify maximum distance in meters.
To specify distance using the find() method, use $maxDistance operator. Use the following form:
db.collection.find( { <location field>: { $near: [ x, y ] } , $maxDistance : z } )
To specify distance with the geoNear command, use the maxDistance option. Use the following form:
db.runCommand( { geoNear: "collection", near: [ x, y ], maxDistance: z } )
By default, geospatial queries using find() method return 100 documents, sorted by distance.
To limit the result when using the find() method, use the limit() method, as in the following prototype:
db.collection.find( { <location field>: { $near: [ x, y ] } } ).limit(<n>)
To limit the result set when using the geoNear command, use the num option. The following is a prototype of the command:
db.runCommand( { geoNear: "collection", near: [ x, y ], num: z } )
The limit() method and near parameter to geoNear do not limit geospatial query results by distance, only the number of results. To limit geospatial search results by distance, please see the Distance Queries section.
Note
The limit() method and num option have different performance characteristics. Geospatial queries using limit() method are slower than using geoNear.
Geospatial queries with the find() method will return 100 documents, sort them, and finally limit the result set. Geospatial queries with the geoNear and num option will only return the specified number of unsorted documents.
Bounded queries return documents within a shape defined using the $within operator. MongoDB’s bounded queries support the following shapes:
Bounded queries do not return sorted results. As a result MongoDB can return bounded queries more quickly than proximity queries. Bounded queries have the following form:
db.collection.find( { <location field> :
{ "$within" :
{ <shape> : <shape dimensions> }
}
} )
The following sections describe each of the shapes supported by bounded queries:
To query for documents with coordinates inside the bounds of a circle, specify the center and the radius of the circle using the $within operator and $center option. Consider the following prototype query:
db.collection.find( { "field": { "$within": { "$center": [ center, radius ] } } } )
The following example query returns all documents that have coordinates that exist within the circle centered on [-74, 40.74] and with a radius of 10, using a geospatial index on the loc field:
db.places.find( { "loc": { "$within":
{ "$center": [ [-74, 40.74], 10 ] }
}
} )
The $within operator using $center is similar to using $maxDistance, but $center has different performance characteristics. MongoDB does not sort queries that use the $within operator are not sorted, unlike queries using the $near operator.
To query for documents with coordinates inside the bounds of a rectangle, specify the lower-left and upper-right corners of the rectangle using the $within operator and $box option. Consider the following prototype query:
db.collection.find( { "field": { "$within": { "$box": [ coordinate0, coordinate1 ] } } } )
The following query returns all documents that have coordinates that exist within the rectangle where the lower-left corner is at [ 0, 0 ] and the upper-right corner is at [ 3, 3 ], using a geospatial index on the loc field:
db.places.find( { "loc": { "$within": { "$box": [ [0, 0] , [3, 3] ] } } } )
New in version 1.9: Support for polygon queries.
To query for documents with coordinates inside of a polygon, specify the points of the polygon in an array, using the the $within operator with the $polygon option. MongoDB automatically connects the last point in the array to the first point. Consider the following prototype query:
db.places.find({ "loc": { "$within": { "$polygon": [ points ] } } })
The following query returns all documents that have coordinates that exist within the polygon defined by [ [0,0], [3,3], [6,0] ]:
db.places.find({ "loc": { "$within": { "$polygon":
[ [ 0,0], [3,3], [6,0] ] } } } )
You can use the db.collection.find() method to query for an exact match on a location. These queries have the following form:
db.collection.find( { <location field>: [ x, y ] } )
This query will return any documents with the value of [ x, y ].
Exact geospatial queries only applicability for a limited selection of cases, the proximity and bounded queries provide more useful results for more applications.
When you query using the 2d index, MongoDB calculates distances using flat geometry by default, which models points on a flat surface.
Optionally, you may instruct MongoDB to calculate distances using spherical geometry, which models points on a spherical surface. Spherical geometry is useful for modeling coordinates on the surface of Earth.
To calculate distances using spherical geometry, use MongoDB’s spherical query operators and options:
See also
For more information on differences between flat and spherical distance calculation, see Distance Calculation.
The distanceMultiplier option geoNear returns distances only after multiplying the results by command by an assigned value. This allows MongoDB to return converted values, and removes the requirement to convert units in application logic.
Note
Because distanceMultiplier is an option to geoNear, the multiplication operation occurs on the mongod process. The operation adds a slight overhead to the operation of geoNear.
Using distanceMultiplier in spherical queries allows one to use results from the geoNear command without radian to distance conversion. The following example uses distanceMultiplier in the geoNear command with a spherical example:
db.runCommand( { geoNear: "places",
near: [ -74, 40.74 ],
spherical: true,
distanceMultiplier: 3963.192
} )
The output of the above operation would resemble the following:
{
// [ ... ]
"results" : [
{
"dis" : 73.46525170413567,
"obj" : {
"_id" : ObjectId( ... )
"loc" : [
-73,
40
]
}
}
],
"stats" : {
// [ ... ]
"avgDistance" : 0.01853688938212826,
"maxDistance" : 0.01853714811400047
},
"ok" : 1
}
See also
The Distance operator.
Haystack indexes are a special 2d geospatial index that optimized to return results over small areas. To create geospatial indexes see Haystack Indexes.
To query the haystack index, use the geoSearch command. You must specify both the coordinate and other field to geoSearch, which take the following form:
db.runCommand( { geoSearch: <haystack index>,
search: { <field>: <value> } } )
For example, to return all documents with the value restaurants in the type field near the example point, the command would resemble:
db.runCommand( { geoSearch: "places",
search: { type: "restaurant" },
near: [-74, 40.74] } )
Note
Haystack indexes are not suited to returning a full list of the closest documents to a particular location, as the closest documents could be far away compared to the bucketSize.
Note
Spherical query operations are not currently supported by haystack indexes.
The find() method and geoNear command cannot access the haystack index.
2d geospatial indexes support efficient queries using location-based data in a document, and special geospatial query operators. You can store two-dimensional location coordinates in documents and with a geospatial index on this field, construct location-based queries. For example, you can query for documents based on proximity to another location or based on inclusion in a specified region.
Additionally, geospatial indexes support queries on both the coordinate field and another field. For example, you might write a query to find restaurants a specific distance from a hotel or to find museums found within a certain defined neighborhood.
This document describes how to include location data in your documents and how to create geospatial indexes. For information on querying data stored in geospatial indexes, see Geospatial Queries with 2d Indexes.
To use 2d geospatial indexes, you must model location data on a predetermined two-dimensional coordinate system, such as longitude and latitude. You store location data as two-dimensional coordinates in a field that holds either a two-dimensional array or an embedded document. Consider the following two examples:
loc : [ x, y ]
loc : { x: 1, y: 2 }
All documents must store location data in the same order; however, if you use latitude and longitude as your coordinate system, always store longitude first. MongoDB’s 2d spherical index operators only recognize [ longitude, latitude ] ordering.
Important
MongoDB only supports one geospatial index per collection.
To create a geospatial index, use the ensureIndex method with the value 2d for the location field of your collection. Consider the following prototype:
db.collection.ensureIndex( { <location field> : "2d" } )
MongoDB’s special geospatial operations use this index when querying for location data.
When you create the index, MongoDB converts location data to binary geohash values, and calculates these values using the location data and the index’s location range, as described in Location Range. The default range for 2d indexes assumes longitude and latitude and uses the bounds -180 inclusive and 180 non-inclusive.
Important
The default boundaries of 2d indexes allow applications to insert documents with invalid latitudes greater than 90 or less than -90. The behavior of geospatial queries with such invalid points is not defined.
When creating a 2d index, MongoDB provides the following options:
All 2d geospatial indexes have boundaries defined by a coordinate range. By default, 2s geospatial indexes assume longitude and latitude have boundaries of -180 inclusive and 180 non-inclusive (i.e. [-180, 180)). MongoDB returns an error and rejects documents with coordinate data outside of the specified range.
To build an index with a different location range other than the default, use the min and max options with the ensureIndex() operation when creating a 2d index, as in the following prototype:
db.collection.ensureIndex( { <location field>: "2d" } ,
{ min: <lower bound> , max: <upper bound> } )
2d indexes use a geohash representation of all coordinate data internally. Geohashes have a precision, determined by the number of bits in the hash. More bits allow the index to provide results with greater precision, while fewer bits only the index to provide results with more limited precision.
Indexes with lower percussion have a lower processing overhead for insert operations and will consume less space; however, higher precision indexes means that queries will need to scan smaller portions of the index to return results. The actual stored values are always used in the final query processing, and index precision does not affect query accuracy.
By default, geospatial indexes use 26 bits of precision, which is roughly equivalent to 2 feet or about 60 centimeters of precision using the default range of -180 to 180. You can configure 2d geospatial indexes with up to 32 bits of precision.
To configure a location precision other than the default, use the bits option in the ensureIndex() method, as in the following prototype:
db.collection.ensureIndex( {<location field>: "2d"} ,
{ bits: <bit precision> } )
For more information on the relationship between bits and precision, see Geohash Values.
2d geospatial indexes may be compound, if an only if the field with location data is the first field. A compound geospatial index makes it possible to construct queries that primarily select on a location-based field, but also select on a second criteria. For example, you could use this kind of index to support queries for carpet wholesalers within a specific region.
Note
Geospatial queries will only use additional query parameters after applying the geospatial criteria. If your geospatial query criteria queries select a large number of documents, the additional query will only filter the result set, and not result in a more targeted query.
To create a geospatial index with two fields, specify the location field first, then the second field. For example, to create a compound index on the loc location field and on the product field (sorted in ascending order), you would issue the following:
db.storeInfo.ensureIndex( { loc: "2d", product: 1 } );
This creates an index that supports queries on the just location field (i.e. loc), as well as queries on both the loc and product.
Haystack indexes create “buckets” of documents from the same geographic area in order to improve performance for queries limited to that area.
Each bucket in a haystack index contains all the documents within a specified proximity to a given longitude and latitude. Use the bucketSize parameter of ensureIndex() to determine proximity. A bucketSize of 5 creates an index that groups location values that are within 5 units of the specified longitude and latitude.
bucketSize also determines the granularity of the index. You can tune the parameter to the distribution of your data so that in general you search only very small regions of a two-dimensional space. Furthermore, the areas defined by buckets can overlap: as a result a document can exist in multiple buckets.
To build a haystack index, use the bucketSize parameter in the ensureIndex() method, as in the following prototype:
db.collection.ensureIndex({ <location field>: "geoHaystack", type: 1 },
{ bucketSize: <bucket value> })
Example
Consider a collection with documents that contain fields similar to the following:
{ _id : 100, { long : 126.9, lat : 35.2 }, type : "restaurant"}
{ _id : 200, { long : 127.5, lat : 36.1 }, type : "restaurant"}
{ _id : 300, { long : 128.0, lat : 36.7 }, type : "national park"}
The following operations creates a haystack index with buckets that store keys within 1 unit of longitude or latitude.
db.mydb.ensureIndex( { pos : "geoHaystack", type : 1 }, { bucketSize : 1 } )
Therefore, this index stores the document with an _id field that has the value 200 in two different buckets:
To query using a haystack index you use the geoSearch command. For command details, see Querying Haystack Indexes.
Haystack indexes are ideal for returning documents based on location and an exact match on a single additional criteria. These indexes are not necessarily suited to returning the closest documents to a particular location.
Spherical queries are not supported by geospatial haystack indexes.
By default, queries that use a haystack index return 50 documents.
MongoDB performs distance calculations before performing 2d geospatial queries. By default, MongoDB uses flat geometry to calculate distances between points. MongoDB also supports distance calculations using spherical geometry, to provide accurate distances for geospatial information based on a sphere or earth.
Spherical Queries Use Radians for Distance
For spherical operators to function properly, you must convert distances to radians, and convert from radians to distances units for your application.
To convert:
The radius of the Earth is approximately 3963.192 miles or 6378.137 kilometers.
The following query would return documents from the places collection, within the circle described by the center [ -74, 40.74 ] with a radius of 100 miles:
db.places.find( { loc: { $centerSphere: [ [ -74, 40.74 ] ,
100 / 3963.192 ] } } )
You may also use the distanceMultiplier option to the geoNear to convert radians in the mongod process, rather than in your application code. Please see the distance multiplier section.
The following spherical 2d query, returns all documents in the collection places within 100 miles from the point [ -74, 40.74 ].
db.runCommand( { geoNear: "places",
near: [ -74, 40.74 ],
spherical: true
} )
The output of the above command would be:
{
// [ ... ]
"results" : [
{
"dis" : 0.01853688938212826,
"obj" : {
"_id" : ObjectId( ... )
"loc" : [
-73,
40
]
}
}
],
"stats" : {
// [ ... ]
"avgDistance" : 0.01853688938212826,
"maxDistance" : 0.01853714811400047
},
"ok" : 1
}
Warning
Spherical queries that wrap around the poles or at the transition from -180 to 180 longitude raise an error.
Note
While the default Earth-like bounds for geospatial indexes are between -180 inclusive, and 180, valid values for latitude are between -90 and 90.
To create a geospatial index, MongoDB computes the geohash value for coordinate pairs within the specified range, and indexes the geohash for that point .
To calculate a geohash value, continuously divide a 2D map into quadrants. Then, assign each quadrant a two-bit value. For example, a two-bit representation of four quadrants would be:
01 11
00 10
These two bit values, 00, 01, 10, and 11, represent each of the quadrants and all points within each quadrant. For a geohash with two bits of resolution, all points in the bottom left quadrant would have a geohash of 00. The top left quadrant would have the geohash of 01. The bottom right and top right would have a geohash of 10 and 11, respectively.
To provide additional precision, continue dividing each quadrant into sub-quadrants. Each sub-quadrant would have the geohash value of the containing quadrant concatenated with the value of the sub-quadrant. The geohash for the upper-right quadrant is 11, and the geohash for the sub-quadrants would be (clockwise from the top left): 1101, 1111, 1110, and 1100, respectively.
To calculate a more precise geohash, continue dividing the sub-quadrant and concatenate the two-bit identifier for each division. The more “bits” in the hash identifier for a given point, the smaller possible area that the hash can describe and the higher the resolution of the geospatial index.
You cannot use a geospatial index as a shard key when sharding a collection. However, you can create and maintain a geospatial index on a sharded collection, using a different field as the shard key. Your application may query for geospatial data using geoNear and $within; however, queries using $near are not supported for sharded collections.
New in version 2.0: Support for multiple locations in a document.
While 2d indexes do not support more than one set of coordinates in a document you can use a multi-key indexes, to store and index multiple coordinate pairs in a single document. In the simplest example, you may have a field (e.g. locs) that holds an array of coordinates, as in the following prototype data model:
{
"_id": ObjectId(...),
"locs": [
[ 55.5, 42.3 ],
[ -74, 44.74 ],
{ "lat": 55.3, "long": 40.2 }
]
}
The values of the array may either be arrays holding coordinates, as in [ 55.5, 42.3 ] or embedded documents as in { "lat": 55.3, "long": 40.2 }.
You could then create a geospatial index on the locs field, as in the following:
db.places.ensureIndex( { "locs": "2d" } )
You may also model the location data as a field inside of a sub-document. In this case, the document would contain field (e.g. addresses) that held an array of documents where each document has a field (e.g. loc:) that holds location coordinates. Consider the following prototype data model:
{
"_id": ObjectId(...),
"name": "...",
"addresses": [
{
"context": "home",
"loc": [ 55.5, 42.3 ]
},
{
"context": "home",
"loc": [ -74, 44.74 ]
}
]
}
Then, create the geospatial index on the addresses.loc field as in the following example:
db.records.ensureIndex( { "addresses.loc": "2d" } )
For documents with multiple coordinate values, queries may return the same document multiple times, if more than one indexed coordinate pair satisfies the query constraints. Use the uniqueDocs parameter to geoNear or the $uniqueDocs operator in conjunction with $within.
To include the location field with the distance field in multi-location document queries, specify includeLocs: true in the geoNear command.
MongoDB provides language-specific client libraries called drivers that let you develop applications to interact with your databases.
This page lists the documents, tutorials, and reference pages that describe application development. For API-level documentation, see Drivers.
For an overview of topics with which every MongoDB application developer will want familiarity, see the aggregation and indexes documents. For an introduction too basic MongoDB use, see the administration tutorials.
See also
Developer Zone wiki pages and the FAQ: MongoDB for Application Developers document. Developers also should be familiar with Using the MongoDB Shell and the MongoDB query and update operators.
The following documents outline basic application development topics:
Applications communicate with MongoDB by way of a client library or driver that handles all interaction with the database in language appropriate and sensible manner. See the following pages for more information about the MongoDB wiki drivers page:
Data in MongoDB has a flexible schema. Collections do not enforce document structure. This means that:
Each document only needs to contain relevant fields to the entity or object that the document represents. In practice, *most documents in a collection share a similar structure. Schema flexibility means that you can model your documents in MongoDB so that they can closely resemble and reflect application-level objects.
As in all data modeling, when developing data models (i.e. schema designs,) for MongoDB you must consider the inherent properties and requirements of the application objects and the relationships between application objects. MongoDB data models must also reflect:
These considerations and requirements force developers to make a number of multi-factored decisions when modeling data, including:
normalization and de-normalization.
These decisions reflect degree to which the data model should store related pieces of data in a single document or should documents model relationships using references.
representation of data in arrays in BSON.
Although a number of data models may be functionally equivalent for a given application; however, different data models may have significant impacts on MongoDB and applications performance.
This document provides a high level overview of these data modeling decisions and factors. In addition, consider, the Data Modeling Patterns and Examples section which provides more concrete examples of all the discussed patterns.
Data modeling decisions involve determining how to structure the documents to model the data effectively. The primary decision is whether to embed or to use references.
To de-normalize data, store two related pieces of data in a single document.
Operations within a document are less expensive for the server than operations that involve multiple documents.
In general, use embedded data models when:
Embedding provides the following benefits:
Embedding related data in documents, can lead to situations where documents grow after creation. Document growth can impact write performance and lead to data fragmentation. Furthermore, documents in MongoDB must be smaller than the maximum BSON document size. For larger documents, consider using GridFS.
For examples in accessing embedded documents, see Subdocuments.
See also
To normalize data, store references between two documents to indicate a relationship between the data represented in each document.
In general, use normalized data models:
Referencing provides more flexibility than embedding; however, to resolve the references, client-side applications must issue follow-up queries. In other words, using references requires more roundtrips to the server.
See Model Referenced One-to-Many Relationships Between Documents for an example of referencing.
MongoDB only provides atomic operations on the level of a single document. [1] As a result needs for atomic operations influence decisions to use embeded or referenced relationships when modeling data for MongoDB.
Embed fields that need to be modified together atomically in the same document. See Model Data for Atomic Operations for an example of atomic updates within a single document.
| [1] | Document-level atomic operations include all operations within a single MongoDB document record: operations that affect multiple sub-documents within that single record are still atomic. |
In addition to normalization and normalization concerns, a number of other operational factors help shape data modeling decisions in MongoDB. These factors include:
These factors implications for database and application performance as well as future maintenance and development costs.
Data modeling decisions should also take data lifecycle management into consideration.
The Time to Live or TTL feature of collections expires documents after a period of time. Consider using the TTL feature if your application requires some data to persist in the database for a limited period of time.
Additionally, if your application only uses recently inserted documents consider Capped Collections. Capped collections provide first-in-first-out (FIFO) management of inserted documents and optimized to support operations that insert and read documents based on insertion order.
In certain situations, you might choose to store information in several collections rather than in a single collection.
Consider a sample collection logs that stores log documents for various environment and applications. The logs collection contains documents of the following form:
{ log: "dev", ts: ..., info: ... }
{ log: "debug", ts: ..., info: ...}
If the total number of documents is low you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as logs.dev and logs.debug. The logs.dev collection would contain only the documents related to the dev environment.
Generally, having large number of collections has no significant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing.
When using models that have a large number of collections, consider the following behaviors:
A single <database>.ns file stores all meta-data for each database. Each index and collection has its own entry in the namespace file, MongoDB places limits on the size of namespace files..
Because of limits on namespaces, you may wish to know the current number of namespaces in order to determine how many additional namespaces the database can support, as in the following example:
db.system.namespaces.count()
The <database>.ns file defaults to 16 MB. To change the size of the <database>.ns file, pass a new size to --nssize option <new size MB> on server start.
The --nssize sets the size for new <database>.ns files. For existing databases, after starting up the server with --nssize, run the db.repairDatabase() command from the mongo shell.
Create indexes to support common queries. Generally, indexes and index use in MongoDB correspond to indexes and index use in relational database: build indexes on fields that appear often in queries and for all operations that return sorted results. MongoDB automatically creates a unique index on the _id field.
As you create indexes, consider the following behaviors of indexes:
See Indexing Strategies for more information on determining indexes. Additionally, the MongoDB database profiler may help identify inefficient queries.
Sharding allows users to partition a collection within a database to distribute the collection’s documents across a number of mongod instances or shards.
The shard key determines how MongoDB distributes data among shards in a sharded collection. Selecting the proper shard key has significant implications for performance.
See Sharding Fundamentals for more information on sharding and the selection of the shard key.
Certain updates to documents can increase the document size, such as pushing elements to an array and adding new fields. If the document size exceeds the allocated space for that document, MongoDB relocates the document on disk. This internal relocation can be both time and resource consuming.
Although MongoDB automatically provides padding to minimize the occurrence of relocations, you may still need to manually handle document growth. Refer to Pre-Aggregated Reports for an example of the Pre-allocation approach to handle document growth.
The following documents provide overviews of various data modeling patterns and common schema design considerations:
For more information and examples of real-world data modeling, consider the following external resources:
There are many factors that can affect performance of operations in MongoDB, including index use, query structure, data modeling, application design and architecture, as well as operational factors including architecture and system configuration. This document addresses key application optimization strategies, and includes examples and links to relevant reference material.
This section describes techniques for optimizing database performance with MongoDB with particular attention to query performance and basic client operations.
For commonly issued queries, create indexes. If a query searches multiple fields, create a compound index. Scanning an index is much faster than scanning a collection. The indexes structures are smaller than the documents reference, and store references in order.
Example
If you have a posts collection containing blog posts, and if you regularly issue a query that sorts on the author_name field, then you can optimize the query by creating an index on the author_name field:
db.posts.ensureIndex( { author_name : 1 } )
Indexes also improve efficiency on queries that routinely sort on a given field.
Example
If you regularly issue a query that sorts on the timestamp field, then you can optimize the query by creating an index on the timestamp field:
Creating this index:
db.posts.ensureIndex( { timestamp : 1 } )
Optimizes this query:
db.posts.find().sort( { timestamp : -1 } )
Because MongoDB can read indexes in both ascending and descending order, the direction of a single-key index does not matter.
Indexes support queries, update operations, and some phases of the aggregation pipeline for more information.
MongoDB cursors return results in groups of multiple documents. If you know the number of results you want, you can reduce the demand on network resources by issuing the cursor.limit() method.
This is typically used in conjunction with sort operations. For example, if you need only 10 results from your query to the posts collection, you would issue the following command:
db.posts.find().sort( { timestamp : -1 } ).limit(10)
For more information on limiting results, see cursor.limit()
When you need only a subset of fields from documents, you can achieve better performance by returning only the fields you need:
For example, if in your query to the posts collection, you need only the timestamp, title, author, and abstract fields, you would issue the following command:
db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1 } )
For more information on using projections, see Result Projections.
MongoDB provides a database profiler that shows performance characteristics of each operation against the database. Use the profiler to locate any queries or write operations that are running slow. You can use this information, for example, to determine what indexes to create.
For more information, see Database Profiling.
The db.currentOp() method reports on current operations running on a mongod instance. For documentation of the output of db.currentOp() see Current Operation Reporting.
The explain() method returns statistics on a query, and reports the index MongoDB selected to fulfill the query, as well as information about the internal operation of the query.
Example
To use explain() on a query for documents matching the expression { a: 1 }, in the collection records, use an operation that resembles the following in the mongo shell:
db.records.find( { a: 1 } ).explain()
In most cases the query optimizer selects the optimal index for a specific operation; however, you can force MongoDB to use a specific index using the hint() method. Use hint() to support performance testing, or on some queries where you must select a field or field included in several indexes.
Use MongoDB’s $inc operator to increment or decrement values in documents. The operator increments the value of the field on the server side, as an alternative to selecting a document, making simple modifications in the client and then writing the entire document to the server. The $inc operator can also help avoid race conditions, which would result when two application instances queried for a document, manually incremented a field, and saved the entire document back at the same time.
For some kinds of operations, you can perform operations on the mongod server itself rather than writing a client application to perform a simple task. This can eliminate network overhead for client operations for some basic administrative operations. Consider the following example:
Example
For example, if you want to remove a field from all documents in a collection, performing the operation directly on the server is more efficient than transmitting the collection to your client and back again.
For more information, see the Server-side Code Execution wiki page.
Capped Collections are circular, fixed-size collections that keep documents well-ordered, even without the use of an index. This means that capped collections can receive very high-speed writes and sequential reads.
These collections are particularly useful for keeping log files but are not limited to that purpose. Use capped collections where appropriate.
To return documents in the order they exist on disk, return sorted operations using the $natural operator. Natural order does not use indexes but can be fast for operations when you want to select the first or last items on disk. This is particularly useful for capped collections.
MongoDB does not support joins. In MongoDB some data is “denormalized,” or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.
MongoDB applications use one of two methods for relating documents:
Manual references where you save the _id field of one document in another document as a reference. Then your application can run a second query to return the embedded data. These references are simple and sufficient for most use cases.
DBRefs are references from one document to another using the value of the first document’s _id field collection, and optional database name. To resolve DBRefs, your application must perform additional queries to return the referenced documents. Many drivers have helper methods that form the query for the DBRef automatically. The drivers [1] do not automatically resolve DBRefs into documents.
Use a DBRef when you need to embed documents from multiple collections in documents from one collection. DBRefs also provide a common format and type to represent these relationships among documents. The DBRef format provides common semantics for representing links between documents if your database must interact with multiple frameworks and tools.
Unless you have a compelling reason for using a DBRef, use manual references.
| [1] | Some community supported drivers may have alternate behavior and may resolve a DBRef into a document automatically. |
Manual references refers to the practice of including one document’s _id field in another document. The application can then issue a second query to resolve the referenced fields as needed.
Consider the following operation to insert two documents, using the _id field of the first document as a reference in the second document:
original_id = ObjectId()
db.places.insert({
"_id": original_id
"name": "Broadway Center"
"url": "bc.example.net"
})
db.people.insert({
"name": "Erin"
"places_id": original_id
"url": "bc.example.net/Erin"
})
Then, when a query returns the document from the people collection you can, if needed, make a second query for the document referenced by the places_id field in the places collection.
For nearly every case where you want to store a relationship between two documents, use manual references. The references are simple to create and your application can resolve references as needed.
The only limitation of manual linking is that these references do not convey the database and collection name. If you have documents in a single collection that relate to documents in more than one collection, you may need to consider using DBRefs.
DBRefs are a convention for representing a document, rather than a specific reference “type.” They include the name of the collection, and in some cases the database, in addition to the value from the _id field.
DBRefs have the following fields:
The $ref field holds the name of the collection where the referenced document resides.
The $id field contains the value of the _id field in the referenced document.
Optional.
Contains the name of the database where the referenced document resides.
Only some drivers support $db references.
Thus a DBRef document would resemble the following:
{ $ref : <value>, $id : <value>, $db : <value> }
Note
The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
In most cases you should use the manual reference method for connecting two or more related documents. However, if you need to reference documents from multiple collections, consider a DBRef.
MongoDB supports server-side execution of JavaScript code using various methods.
Note
The JavaScript code execution takes a JavaScript lock.
MongoDB performs the execution of JavaScript functions for Map-Reduce operations on the server. Within these JavaScript functions, you must not access the database for any reason, including to perform reads.
See the db.collection.mapReduce() and the Map-Reduce documentation for more information, including examples of map-reduce. See map-reduce concurrency section for concurrency information for map-reduce.
The eval command, and the corresponding mongo shell method db.eval(), evaluates JavaScript functions on the database server. This command may be useful if you need to touch a lot of data lightly since the network transfer of the data could become a bottleneck if performing these operations on the client-side.
Warning
By default, eval command requires a write lock. As such eval will block all other read and write operations while it runs.
See eval command and db.eval() documentation for more information, including examples.
Running a JavaScript (.js) file using a mongo shell instance on the server is a good technique for performing batch administrative work. When you run mongo shell on the server, connecting via the localhost interface, the connection is fast with low latency. Additionally, this technique has the advantage over the eval command since the command eval blocks all other operations.
To perform Read Operations, in addition to the standard operators (e.g. $gt, $lt), with the $where operator, you can also express the query condition either as a string or a full JavaScript function that specifies a SQL-like WHERE clause. However, use the standard operators whenever possible since $where operations have significantly slower performance.
Warning
Do not write to the database within the $where JavaScript function.
See $where documentation for more information, including examples.
Note
We do not recommend using server-side stored functions if possible.
There is a special system collection named system.js that can store JavaScript functions for reuse.
To store a function, you can use the db.collection.save(), as in the following example:
db.system.js.save(
{
_id : "myAddFunction" ,
value : function (x, y){ return x + y; }
}
);
Once you save a function in the system.js collection, you can use the function from any JavaScript context (e.g. eval, $where, map-reduce).
Consider the following example from the mongo shell that first saves a function named echoFunction to the system.js collection and calls the function using db.eval():
db.system.js.save(
{ _id: "echoFunction",
value : function(x) { return x; }
}
)
db.eval( "echoFunction( 'test' )" )
See http://github.com/mongodb/mongo/tree/master/jstests/storefunc.js for a full example.
New in version 2.1: In the mongo shell, you can use db.loadServerScripts() to load all the scripts saved in the system.js collection for the current db. Once loaded, you can invoke the functions directly in the shell, as in the following example:
db.loadServerScripts();
echoFunction(3);
myAddFunction(3, 5);
Refer to the individual method or operator documentation for any concurrency information. See also the concurrency table.
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.
Instead of storing a file in an single document, GridFS divides a file into parts, or chunks, [1] and stores each of those chunks as a separate document. By default GridFS limits chunk size to 256k. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.
When you query a GridFS store for a file, the driver or client will reassemble the chunks as needed. You can perform range queries on files stored through GridFS. You also can access information from arbitrate sections of files, which allows you to “skip” into the middle of a video of audio file.
GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory. For more information on the indications of GridFS, see When should I use GridFS?.
| [1] | The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding. |
To store and retrieve files using GridFS, use either of the following:
GridFS stores files in two collections:
GridFS places the collections in a common bucket by prefixing each with the bucket name. By default, GridFS uses two collections with names prefixed by fs bucket:
You can choose a different bucket name than fs, and create multiple buckets in a single database.
Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFS store. The following is a prototype document from the chunks collection.:
{
"_id" : <string>,
"files_id" : <string>,
"n" : <num>,
"data" : <binary>
}
A document from the chunks collection contains the following fields:
The _id of the “parent” document, as specified in the files collection.
The sequence number of the chunk. GridFS numbers all chunks, starting with 0.
The chunks collection uses a compound index on files_id and n, as described in GridFS Index.
Each document in the files collection represents a file in the term:GridFS store. Consider the following prototype of a document in the files collection:
{
"_id" : <ObjectID>,
"length" : <num>,
"chunkSize" : <num>
"uploadDate" : <timestamp>
"md5" : <hash>
"filename" : <string>,
"contentType" : <string>,
"aliases" : <string array>,
"metadata" : <dataObject>,
}
Documents in the files collection contain some or all of the following fields. Applications may create additional arbitrary fields:
The unique ID for this document. The _id is of the data type you chose for the original document. The default type for MongoDB documents is BSON ObjectID.
The size of the document in bytes.
The size of each chunk. GridFS divides the document into chunks of the size specified here. The default size is 256 kilobytes.
The date the document was first stored by GridFS. This value has the Date type.
An MD5 hash returned from the filemd5 API. This value has the String type.
Optional. A human-readable name for the document.
Optional. A valid MIME type for the document.
Optional. An array of alias strings.
Optional. Any additional information you want to store.
GridFS uses a unique, compound index on the chunks collection for files_id and n. The index allows efficient retrieval of chunks using the files_id and n values, as shown in the following example:
cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1});
See the relevant driver documentation for the specific behavior of your GridFS application. If your driver does not create this index, issue the following operation using the mongo shell:
db.fs.chunks.ensureIndex( { files_id: 1, n: 1 }, { unique: true } );
The following is an example of the GridFS interface in Java. The example is for demonstration purposes only. For API specifics, see the relevant driver documentation.
/*
* default root collection usage - must be supported
*/
GridFS myFS = new GridFS(myDatabase); // returns a default GridFS (e.g. "fs" bucket collection)
myFS.storeFile(new File("/tmp/largething.mpg")); // saves the file into the "fs" GridFS store
/*
* specified root collection usage - optional
*/
GridFS myContracts = new GridFS(myDatabase, "contracts"); // returns a GridFS where "contracts" is root
myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf")); // retrieves object whose filename is "smithco"
ObjectId is a 12-byte BSON type, constructed using:
In MongoDB, documents stored in a collection require a unique _id field that acts as a primary key. Because ObjectIds are small, most likely unique, and fast to generate, MongoDB uses ObjectIds as the default value for the _id field if the _id field is not specified; i.e., the mongod adds the _id field and generates a unique ObjectId to assign as its value.
Using ObjectIds for the _id field, provides the following additional benefits:
Also consider the BSON Documents section for related information on MongoDB’s document orientation.
The mongo shell provides the ObjectId() wrapper class to generate can generate a new ObjectId, and to provide the following helper attribute and methods:
str
The hexadecimal string value of the ObjectId() object.
Returns the timestamp portion of the ObjectId() object as a Date.
Returns the string representation of the ObjectId() object. The returned string literal has the format “ObjectId(...)”.
Changed in version 2.2: In previous versions ObjectId.toString() returns the value of the ObjectId as a hexadecimal string.
Returns the value of the ObjectId() object as a hexadecimal string. The returned string is the str attribute.
Changed in version 2.2: In previous versions ObjectId.valueOf() returns the ObjectId() object.
Consider the following uses ObjectId() class in the mongo shell:
To generate a new ObjectId, use the ObjectId() constructor with no argument:
x = ObjectId()
In this example, the value of x would be:
ObjectId("507f1f77bcf86cd799439011")
To generate a new ObjectId using the ObjectId() constructor with a unique hexadecimal string:
y = ObjectId("507f191e810c19729de860ea")
In this example, the value of y would be:
ObjectId("507f191e810c19729de860ea")
To return the timestamp of an ObjectId() object, use the getTimestamp() method as follows:
ObjectId("507f191e810c19729de860ea").getTimestamp()
This operation will return the following Date object:
ISODate("2012-10-17T20:46:22Z")
Access the str attribute of an ObjectId() object, as follows:
ObjectId("507f191e810c19729de860ea").str
This operation will return the following hexadecimal string:
507f191e810c19729de860ea
To return the string representation of an ObjectId() object, use the toString() method as follows:
ObjectId("507f191e810c19729de860ea").toString()
This operation will return the following output:
ObjectId("507f191e810c19729de860ea")
To return the value of an ObjectId() object as a hexadecimal string, use the valueOf() method as follows:
ObjectId("507f191e810c19729de860ea").valueOf()
This operation will return the following output:
507f191e810c19729de860ea
Capped collections are fixed-size collections that support high-throughput operations that insert, retrieve, and delete documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.
Capped collections have the following behaviors:
For example, the oplog.rs collection that stores a log of the operations in a replica set uses a capped collection. Consider the following potential uses cases for capped collections:
You cannot shard a capped collection.
Capped collections created after 2.2 have an _id field and an index on the _id field by default. Capped collections created before 2.2 do not have an index on the _id field by default. If you are using capped collections with replication prior to 2.2, you should explicitly create an index on the _id field.
You can update documents in a collection after inserting them; however, these updates cannot cause the documents to grow. If the update operation causes the document to grow beyond their original size, the update operation will fail.
If you plan to update documents in a capped collection, remember to create an index to prevent update operations that require a table scan.
You cannot delete documents from a capped collection. To remove all records from a capped collection, use the ‘emptycapped’ command. To remove the collection entirely, use the drop() method.
Warning
If you have a capped collection in a replica set outside of the local database, before 2.2, you should create a unique index on _id. Ensure uniqueness using the unique: true option to the ensureIndex() method or by using an ObjectId for the _id field. Alternately, you can use the autoIndexId option to create when creating the capped collection, as in the Query a Capped Collection procedure.
You must create capped collections explicitly using the createCollection() method, which is a helper in the mongo shell for the create command. When creating a capped collection you must specify the maximum size of the collection in bytes, which MongoDB will pre-allocate for the collection. The size of the capped collection includes a small amount of space for internal overhead.
db.createCollection("mycoll", {capped:true, size:100000})
See
If you perform a find() on a capped collection with no ordering specified, MongoDB guarantees that the ordering of results is the same as the insertion order.
To retrieve documents in reverse insertion order, issue find() along with the sort() method with the $natural parameter set to -1, as shown in the following example:
db.cappedCollection.find().sort( { $natural: -1 } )
Use the db.collection.isCapped() method to determine if a collection is capped, as follows:
db.collection.isCapped()
You can convert a non-capped collection to a capped collection with the convertToCapped command:
db.runCommand({"convertToCapped": "mycoll", size: 100000});
The size parameter specifies the size of the capped collection in bytes.
Changed in version 2.2: Before 2.2, capped collections did not have an index on _id unless you specified autoIndexId to the create, after 2.2 this became the default.
For additional flexibility when expiring data, consider MongoDB’s TTL indexes, as described in Expire Data from Collections by Setting TTL. These indexes allow you to expire and remove data from normal collections using a special type, based on the value of a date-typed field and a TTL value for the index.
TTL Collections are not compatible with capped collections.
The following documents provide patterns for developing application features:
This document provides a pattern for doing multi-document updates or “transactions” using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback like functionality.
Operations on a single document are always atomic with MongoDB databases; however, operations that involve multiple documents, which are often referred to as “transactions,” are not atomic. Since documents can be fairly complex and contain multiple “nested” documents, single-document atomicity provides necessary support for many practical use cases.
Thus, without precautions, success or failure of the database operation cannot be “all or nothing,” and without support for multi-document transactions it’s possible for an operation to succeed for some operations and fail with others. When executing a transaction composed of several sequential operations the following issues arise:
Despite the power of single-document atomic operations, there are cases that require multi-document transactions. For these situations, you can use a two-phase commit, to provide support for these kinds of multi-document updates.
Because documents can represent both pending data and states, you can use a two-phase commit to ensure that data is consistent, and that in the case of an error, the state that preceded the transaction is recoverable.
Note
Because only single-document operations are atomic with MongoDB, two-phase commits can only offer transaction-like semantics. It’s possible for applications to return intermediate data at intermediate points during the two-phase commit or rollback.
The most common example of transaction is to transfer funds from account A to B in a reliable way, and this pattern uses this operation as an example. In a relational database system, this operation would encapsulate subtracting funds from the source (A) account and adding them to the destination (B) within a single atomic transaction. For MongoDB, you can use a two-phase commit in these situations to achieve a compatible response.
All of the examples in this document use the mongo shell to interact with the database, and assume that you have two collections: First, a collection named accounts that will store data about accounts with one account per document, and a collection named transactions which will store the transactions themselves.
Begin by creating two accounts named A and B, with the following command:
db.accounts.save({name: "A", balance: 1000, pendingTransactions: []})
db.accounts.save({name: "B", balance: 1000, pendingTransactions: []})
To verify that these operations succeeded, use find():
db.accounts.find()
mongo will return two documents that resemble the following:
{ "_id" : ObjectId("4d7bc66cb8a04f512696151f"), "name" : "A", "balance" : 1000, "pendingTransactions" : [ ] }
{ "_id" : ObjectId("4d7bc67bb8a04f5126961520"), "name" : "B", "balance" : 1000, "pendingTransactions" : [ ] }
Create the transaction collection by inserting the following document. The transaction document holds the source and destination, which refer to the name fields of the accounts collection, as well as the value field that represents the amount of data change to the balance field. Finally, the state field reflects the current state of the transaction.
db.transactions.save({source: "A", destination: "B", value: 100, state: "initial"})
To verify that these operations succeeded, use find():
db.transactions.find()
This will return a document similar to the following:
{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "source" : "A", "destination" : "B", "value" : 100, "state" : "initial" }
Before modifying either records in the accounts collection, set the transaction state to pending from initial.
Set the local variable t in your shell session, to the transaction document using findOne():
t = db.transactions.findOne({state: "initial"})
After assigning this variable t, the shell will return the value of t, you will see the following output:
{
"_id" : ObjectId("4d7bc7a8b8a04f5126961522"),
"source" : "A",
"destination" : "B",
"value" : 100,
"state" : "initial"
}
Use update() to change the value of state to pending:
db.transactions.update({_id: t._id}, {$set: {state: "pending"}})
db.transactions.find()
The find() operation will return the contents of the transactions collection, which should resemble the following:
{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "source" : "A", "destination" : "B", "value" : 100, "state" : "pending" }
Continue by applying the transaction to both accounts. The update() query will prevent you from applying the transaction if the transaction is not already pending. Use the following update() operation:
db.accounts.update({name: t.source, pendingTransactions: {$ne: t._id}}, {$inc: {balance: -t.value}, $push: {pendingTransactions: t._id}})
db.accounts.update({name: t.destination, pendingTransactions: {$ne: t._id}}, {$inc: {balance: t.value}, $push: {pendingTransactions: t._id}})
db.accounts.find()
The find() operation will return the contents of the accounts collection, which should now resemble the following:
{ "_id" : ObjectId("4d7bc97fb8a04f5126961523"), "balance" : 900, "name" : "A", "pendingTransactions" : [ ObjectId("4d7bc7a8b8a04f5126961522") ] }
{ "_id" : ObjectId("4d7bc984b8a04f5126961524"), "balance" : 1100, "name" : "B", "pendingTransactions" : [ ObjectId("4d7bc7a8b8a04f5126961522") ] }
Use the following update() operation to set the transaction’s state to committed:
db.transactions.update({_id: t._id}, {$set: {state: "committed"}})
db.transactions.find()
The find() operation will return the contents of the transactions collection, which should now resemble the following:
{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "destination" : "B", "source" : "A", "state" : "committed", "value" : 100 }
Use the following update() operation to set remove the pending transaction from the documents in the accounts collection:
db.accounts.update({name: t.source}, {$pull: {pendingTransactions: t._id}})
db.accounts.update({name: t.destination}, {$pull: {pendingTransactions: t._id}})
db.accounts.find()
The find() operation will return the contents of the accounts collection, which should now resemble the following:
{ "_id" : ObjectId("4d7bc97fb8a04f5126961523"), "balance" : 900, "name" : "A", "pendingTransactions" : [ ] }
{ "_id" : ObjectId("4d7bc984b8a04f5126961524"), "balance" : 1100, "name" : "B", "pendingTransactions" : [ ] }
Complete the transaction by setting the state of the transaction document to done:
db.transactions.update({_id: t._id}, {$set: {state: "done"}})
db.transactions.find()
The find() operation will return the contents of the transactions collection, which should now resemble the following:
{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "destination" : "B", "source" : "A", "state" : "done", "value" : 100 }
The most important part of the transaction procedure is not, the prototypical example above, but rather the possibility for recovering the from various failure scenarios when transactions do not complete as intended. This section will provide an overview of possible failures and provide methods to recover from these kinds of events.
There are two classes of failures:
all failures that occur after the first step (i.e. “setting the transaction set to initial”) but before the third step (i.e. “applying the transaction to both accounts.”)
To recover, applications should get a list of transactions in the pending state and resume from the second step (i.e. “switching the transaction state to pending.”)
all failures that occur after the third step (i.e. “applying the transaction to both accounts”) but before the fifth step (i.e. “setting the transaction state to done.”)
To recover, application should get a list of transactions in the committed state and resume from the fourth step (i.e. “remove the pending transaction.”)
Thus, the application will always be able to resume the transaction and eventually arrive at a consistent state. Run the following recovery operations every time the application starts to catch any unfinished transactions. You may also wish run the recovery operation at regular intervals to ensure that your data remains consistent.
The time required to reach a consistent state depends, on how long the application needs to recover each transaction.
In some cases you may need to “rollback” or undo a transaction when the application needs to “cancel” the transaction, or because it can never recover as in cases where one of the accounts doesn’t exist, or stops existing during the transaction.
There are two possible rollback operations:
Begin by setting the transaction’s state to canceling using the following update() operation:
db.transactions.update({_id: t._id}, {$set: {state: "canceling"}})
Use the following sequence of operations to undo the transaction operation from both accounts:
db.accounts.update({name: t.source, pendingTransactions: t._id}, {$inc: {balance: t.value}, $pull: {pendingTransactions: t._id}})
db.accounts.update({name: t.destination, pendingTransactions: t._id}, {$inc: {balance: -t.value}, $pull: {pendingTransactions: t._id}})
db.accounts.find()
The find() operation will return the contents of the accounts collection, which should resemble the following:
{ "_id" : ObjectId("4d7bc97fb8a04f5126961523"), "balance" : 1000, "name" : "A", "pendingTransactions" : [ ] }
{ "_id" : ObjectId("4d7bc984b8a04f5126961524"), "balance" : 1000, "name" : "B", "pendingTransactions" : [ ] }
Transactions exist, in part, so that several applications can create and run operations concurrently without causing data inconsistency or conflicts. As a result, it is crucial that only one 1 application can handle a given transaction at any point in time.
Consider the following example, with a single transaction (i.e. T1) and two applications (i.e. A1 and A1). If both applications begin processing the transaction which is still in the initial state (i.e. step 1), then:
To handle multiple applications, create a marker in the transaction document itself to identify the application that is handling the transaction. Use findAndModify() method to modify the transaction:
t = db.transactions.findAndModify({query: {state: "initial", application: {$exists: 0}},
update: {$set: {state: "pending", application: "A1"}},
new: true})
When you modify and reassign the local shell variable t, the mongo shell will return the t object, which should resemble the following:
{
"_id" : ObjectId("4d7be8af2c10315c0847fc85"),
"application" : "A1",
"destination" : "B",
"source" : "A",
"state" : "pending",
"value" : 150
}
Amend the transaction operations to ensure that only applications that match the identifier in the value of the application field before applying the transaction.
If the application A1 fails during transaction execution, you can use the recovery procedures, but applications should ensure that they “owns” the transaction before applying the transaction. For example to resume pending jobs, use a query that resembles the following:
db.transactions.find({application: "A1", state: "pending"})
This will (or may) return a document from the transactions document that resembles the following:
{ "_id" : ObjectId("4d7be8af2c10315c0847fc85"), "application" : "A1", "destination" : "B", "source" : "A", "state" : "pending", "value" : 150 }
The example transaction above is intentionally simple. For example, it assumes that:
Production implementations would likely be more complex. Typically accounts need to information about current balance, pending credits, pending debits. Then:
Because all of the changes in the above two operations occur within a single update() operation, these changes are all atomic.
Additionally, for most important transactions, ensure that:
Write operations are atomic on the level of a single document: no single write operation can atomically affect more than one document or more than one collection.
When a single write operation modifies multiple documents, the operation as a whole is not atomic, and other operations may interleave. The modification of a single document, or record, is always atomic, even if the write operation modifies multiple sub-document within the single record.
No other operations are atomic; however, you can isolate a single write operation that affects multiple documents using the isolation operator.
This document describes one method of updating documents only if the local copy of the document reflects the current state of the document in the database. In addition the following methods provide a way to manage isolated sequences of operations:
In this pattern, you will:
Consider the following example in JavaScript which attempts to update the qty field of a document in the products collection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | var myCollection = db.products;
var myDocument = myCollection.findOne( { sku: 'abc123' } );
if (myDocument) {
var oldQty = myDocument.qty;
if (myDocument.qty < 10) {
myDocument.qty *= 4;
} else if ( myDocument.qty < 20 ) {
myDocument.qty *= 3;
} else {
myDocument.qty *= 2;
}
myCollection.update(
{
_id: myDocument._id,
qty: oldQty
},
{
$set: { qty: myDocument.qty }
}
)
var err = db.getLastErrorObj();
if ( err && err.code ) {
print("unexpected error updating document: " + tojson( err ));
} else if ( err.n == 0 ) {
print("No update: no matching document for { _id: " + myDocument._id + ", qty: " + oldQty + " }")
}
}
|
Your application may require some modifications of this pattern, such as:
MongoDB reserves the _id field in the top level of all documents as a primary key. _id must be unique, and always has an index with a unique constraint. However, except for the unique constraint you can use any value for the _id field in your collections. This tutorial describes two methods for creating an incrementing sequence number for the _id field using the following:
Warning
Generally in MongoDB, you would not use an auto-increment pattern for the _id field, or any field, because it does not scale for databases with larger numbers of documents. Typically the default value ObjectId is more ideal for the _id.
Use a separate counters collection tracks the last number sequence used. The _id field contains the sequence name and the seq contains the last value of the sequence.
Insert into the counters collection, the initial value for the userid:
db.counters.insert(
{
_id: "userid",
seq: 0
}
)
Create a getNextSequence function that accepts a name of the sequence. The function uses the findAndModify() <db.collection.findAndModify() method to atomically increment the seq value and return this new value:
function getNextSequence(name) {
var ret = db.counters.findAndModify(
{
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
}
);
return ret.seq;
}
Use this getNextSequence() function during insert().
db.users.insert(
{
_id: getNextSequence("userid"),
name: "Sarah C."
}
)
db.users.insert(
{
_id: getNextSequence("userid"),
name: "Bob D."
}
)
You can verify the results with find():
db.users.find()
The _id fields contain incrementing sequence values:
{
_id : 1,
name : "Sarah C."
}
{
_id : 2,
name : "Bob D."
}
In this pattern, an Optimistic Loop calculates the incremented _id value and attempts to insert a document with the calculated _id value. If the insert is successful, the loop ends. Otherwise, the loop will iterate through possible _id values until the insert is successful.
Create a function named insertDocument that performs the “insert if not present” loop. The function wraps the insert() method and takes a doc and a targetCollection arguments.
function insertDocument(doc, targetCollection) {
while (1) {
var cursor = targetCollection.find( {}, { _id: 1 } ).sort( { _id: -1 } ).limit(1);
var seq = cursor.hasNext() ? cursor.next()._id + 1 : 1;
doc._id = seq;
targetCollection.insert(doc);
var err = db.getLastErrorObj();
if( err && err.code ) {
if( err.code == 11000 /* dup key */ )
continue;
else
print( "unexpected error inserting data: " + tojson( err ) );
}
break;
}
}
The while (1) loop performs the following actions:
Use the insertDocument() function to perform an insert:
var myCollection = db.users2;
insertDocument(
{
name: "Grace H."
},
myCollection
);
insertDocument(
{
name: "Ted R."
},
myCollection
)
You can verify the results with find():
db.users2.find()
The _id fields contain incrementing sequence values:
{
_id: 1,
name: "Grace H."
}
{
_id : 2,
"name" : "Ted R."
}
The while loop may iterate many times in collections with larger insert volumes.
New in version 2.2.
This document provides an introductions to MongoDB’s “time to live” or “TTL” collection feature. Implemented as a special index type, TTL collections make it possible to store data in MongoDB and have the mongod automatically remove data after a specified period of time. This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited period of time.
Collections expire by way of a special index that keeps track of insertion time in conjunction with a background thread in mongod that regularly removes expired documents from the collection. You can use this feature to expire data from replica sets and sharded clusters.
Use the expireAfterSeconds option to the ensureIndex method in conjunction with a TTL value in seconds to create an expiring collection. TTL collections set the usePowerOf2Sizes collection flag, which means MongoDB must allocate more disk space relative to data size. This approach helps mitigate the possibility of storage fragmentation caused by frequent delete operations and leads to more predictable storage use patterns.
Note
When the TTL thread is active, you will see a delete operation in the output of db.currentOp() or in the data collected by the database profiler.
Consider the following limitations:
Note
TTL indexes expire data by removing documents in a background task that runs once a minute. As a result, the TTL index provides no guarantees that expired documents will not exist in the collection. Consider that:
To set a TTL on the collection “log.events” for one hour use the following command at the mongo shell:
db.log.events.ensureIndex( { "status": 1 }, { expireAfterSeconds: 3600 } )
The status field must hold date/time information. MongoDB will automatically delete documents from this collection once the value of status is one or more hours old.
The TTL background thread only runs on primary members of replica sets. Secondaries members will replicate deletion operations from the primaries.
If your application needs to perform queries on the content of a field that holds text you can perform exact matches on the text or use $regex to use regular expression pattern matches. However, for many operations on text, these methods do not satisfy application requirements.
This pattern describes one method for supporting keyword search using MongoDB to support application search functionality, that uses keywords stored in an array in the same document as the text field. Combined with a multi-key index, this pattern can support application’s keyword search operations.
Note
Keyword search is not the same as text search or full text search, and does not provide stemming or other text-processing features. See the Limitations of Keyword Indexes section for more information.
To add structures to your document to support keyword-based queries, create an array field in your documents and add the keywords as strings in the array. You can then create a multi-key index on the array and create queries that select values from the array.
Example
Suppose you have a collection of library volumes that you want to make searchable by topics. For each volume, you add the array topics, and you add as many keywords as needed for a given volume.
For the Moby-Dick volume you might have the following document:
{ title : "Moby-Dick" ,
author : "Herman Melville" ,
published : 1851 ,
ISBN : 0451526996 ,
topics : [ "whaling" , "allegory" , "revenge" , "American" ,
"novel" , "nautical" , "voyage" , "Cape Cod" ]
}
You then create a multi-key index on the topics array:
db.volumes.ensureIndex( { topics: 1 } )
The multi-key index creates separate index entries for each keyword in the topics array. For example the index contains one entry for whaling and another for allegory.
You then query based on the keywords. For example:
db.volumes.findOne( { topics : "voyage" }, { title: 1 } )
Note
An array with a large number many elements, such as one with several hundreds or thousands of keywords will incur greater indexing costs on insertion.
MongoDB can support keyword searches using specific data models and multi-key indexes; however, these keyword indexes are not sufficient or comparable to full-text products in the following respects:
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses embedded documents to describe relationships between connected data.
Consider the following example that maps patron and address relationships. The example illustrates the advantage of embedding over referencing if you need to view one data entity in context of the other. In this one-to-one relationship between patron and address data, the address belongs to the patron.
In the normalized data model, the address contains a reference to the parent.
{
_id: "joe",
name: "Joe Bookreader"
}
{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA"
zip: 12345
}
If the address data is frequently retrieved with the name information, then with referencing, your application needs to issue multiple queries to resolve the reference. The better data model would be to embed the address data in the patron data, as in the following document:
{
_id: "joe",
name: "Joe Bookreader",
address: {
street: "123 Fake Street",
city: "Faketon",
state: "MA"
zip: 12345
}
}
With the embedded data model, your application can retrieve the complete patron information with one query.
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses embedded documents to describe relationships between connected data.
Consider the following example that maps patron and multiple address relationships. The example illustrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-many relationship between patron and address data, the patron has multiple address entities.
In the normalized data model, the address contains a reference to the parent.
{
_id: "joe",
name: "Joe Bookreader"
}
{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: 12345
}
{
patron_id: "joe",
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: 12345
}
If your application frequently retrieves the address data with the name information, then your application needs to issue multiple queries to resolve the references. A more optimal schema would be to embed the address data entities in the patron data, as in the following document:
{
_id: "joe",
name: "Joe Bookreader",
addresses: [
{
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: 12345
},
{
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: 12345
}
]
}
With the embedded data model, your application can retrieve the complete patron information with one query.
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses references between documents to describe relationships between connected data.
Consider the following example that maps publisher and book relationships. The example illustrates the advantage of referencing over embedding to avoid repetition of the publisher information.
Embedding the publisher document inside the book document would lead to repetition of the publisher data, as the following documents show:
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
To avoid repetition of the publisher data, use references and keep the publisher information in a separate collection from the book collection.
When using references, the growth of the relationships determine where to store the reference. If the number of books per publisher is small with limited growth, storing the book reference inside the publisher document may sometimes be useful. Otherwise, if the number of books per publisher is unbounded, this data model would lead to mutable, growing arrays, as in the following example:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}
To avoid mutable, growing arrays, store the publisher reference inside the book document:
{
_id: "oreilly",
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: "oreilly"
}
Consider the following example that keeps a library book and its checkout information. The example illustrates how embedding fields related to an atomic update within the same document ensures that the fields are in sync.
Consider the following book document that stores the number of available copies for checkout and the current checkout information:
book = {
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly",
available: 3,
checkout: [ { by: "joe", date: ISODate("2012-10-15") } ]
}
You can use the db.collection.findAndModify() method to atomically determine if a book is available for checkout and update with the new checkout information. Embedding the available field and the checkout field within the same document ensures that the updates to these fields are in sync:
db.books.findAndModify ( {
query: {
_id: 123456789,
available: { $gt: 0 }
},
update: {
$inc: { available: -1 },
$push: { checkout: { by: "abc", date: new Date() } }
}
} )
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents by storing references to “parent” nodes in children nodes.
The Parent References pattern stores each tree node in a document; in addition to the tree node, the document stores the id of the node’s parent.
Consider the following example that models a tree of categories using Parent References:
db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
db.categories.insert( { _id: "Postgres", parent: "Databases" } )
db.categories.insert( { _id: "Databases", parent: "Programming" } )
db.categories.insert( { _id: "Languages", parent: "Programming" } )
db.categories.insert( { _id: "Programming", parent: "Books" } )
db.categories.insert( { _id: "Books", parent: null } )
The query to retrieve the parent of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).parent
You can create an index on the field parent to enable fast search by the parent node:
db.categories.ensureIndex( { parent: 1 } )
You can query by the parent field to find its immediate children nodes:
db.categories.find( { parent: "Databases" } )
The Parent Links pattern provides a simple solution to tree storage, but requires multiple queries to retrieve subtrees.
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents by storing references in the parent-nodes to children nodes.
The Child References pattern stores each tree node in a document; in addition to the tree node, document stores in an array the id(s) of the node’s children.
Consider the following example that models a tree of categories using Child References:
db.categories.insert( { _id: "MongoDB", children: [] } )
db.categories.insert( { _id: "Postgres", children: [] } )
db.categories.insert( { _id: "Databases", children: [ "MongoDB", "Postgres" ] } )
db.categories.insert( { _id: "Languages", children: [] } )
db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } )
db.categories.insert( { _id: "Books", children: [ "Programming" ] } )
The query to retrieve the immediate children of a node is fast and straightforward:
db.categories.findOne( { _id: "Databases" } ).children
You can create an index on the field children to enable fast search by the child nodes:
db.categories.ensureIndex( { children: 1 } )
You can query for a node in the children field to find its parent node as well as its siblings:
db.categories.find( { children: "MongoDB" } )
The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple parents.
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents using references to parent nodes and an array that stores all ancestors.
The Array of Ancestors pattern stores each tree node in a document; in addition to the tree node, document stores in an array the id(s) of the node’s ancestors or path.
Consider the following example that models a tree of categories using Array of Ancestors:
db.categories.insert( { _id: "MongoDB", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Postgres", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Databases", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Languages", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Programming", ancestors: [ "Books" ], parent: "Books" } )
db.categories.insert( { _id: "Books", ancestors: [ ], parent: null } )
The query to retrieve the ancestors or path of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).ancestors
You can create an index on the field ancestors to enable fast search by the ancestors nodes:
db.categories.ensureIndex( { ancestors: 1 } )
You can query by the ancestors to find all its descendants:
db.categories.find( { ancestors: "Programming" } )
The Array of Ancestors pattern provides a fast and efficient solution to find the descendants and the ancestors of a node by creating an index on the elements of the ancestors field. This makes Array of Ancestors a good choice for working with subtrees.
The Array of Ancestors pattern is slightly slower than the Materialized Paths pattern but is more straightforward to use.
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents by storing full relationship paths between documents.
The Materialized Paths pattern stores each tree node in a document; in addition to the tree node, document stores as a string the id(s) of the node’s ancestors or path. Although the Materialized Paths pattern requires additional steps of working with strings and regular expressions, the pattern also provides more flexibility in working with the path, such as finding nodes by partial paths.
Consider the following example that models a tree of categories using Materialized Paths ; the path string uses the comma , as a delimiter:
db.categories.insert( { _id: "Books", path: null } )
db.categories.insert( { _id: "Programming", path: "Books," } )
db.categories.insert( { _id: "Databases", path: "Books,Programming," } )
db.categories.insert( { _id: "Languages", path: "Books,Programming," } )
db.categories.insert( { _id: "MongoDB", path: "Books,Programming,Databases," } )
db.categories.insert( { _id: "Postgres", path: "Books,Programming,Databases," } )
You can query to retrieve the whole tree, sorting by the path:
db.categories.find().sort( { path: 1 } )
You can create an index on the field path to enable fast search by the path:
db.categories.ensureIndex( { path: 1 } )
You can use regular expressions on the path field to find the descendants of Programming:
db.categories.find( { path: /,Programming,/ } )
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity. See Data Modeling Considerations for MongoDB Applications for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree like structure that optimizes discovering subtrees at the expense of tree mutability.
The Nested Sets pattern identifies each node in the tree as stops in a round-trip traversal of the tree. The application visits each node in the tree twice; first during the initial trip, and second during the return trip. The Nested Sets pattern stores each tree node in a document; in addition to the tree node, document stores the id of node’s parent, the node’s initial stop in the left field, and its return stop in the right field.
Consider the following example that models a tree of categories using Nested Sets:
db.categories.insert( { _id: "Books", parent: 0, left: 1, right: 12 } )
db.categories.insert( { _id: "Programming", parent: "Books", left: 2, right: 11 } )
db.categories.insert( { _id: "Languages", parent: "Programming", left: 3, right: 4 } )
db.categories.insert( { _id: "Databases", parent: "Programming", left: 5, right: 10 } )
db.categories.insert( { _id: "MongoDB", parent: "Databases", left: 6, right: 7 } )
db.categories.insert( { _id: "Postgres", parent: "Databases", left: 8, right: 9 } )
You can query to retrieve the descendants of a node:
var databaseCategory = db.v.findOne( { _id: "Databases" } );
db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } } );
The Nested Sets pattern provides a fast and efficient solution for finding subtrees but is inefficient for modifying the tree structure. As such, this pattern is best for static trees that do not change.
See also
The introductory “Tutorial” in the MongoDB wiki and the “Mongo Shell” wiki pages for more information on the mongo shell.
Consider the following reference material that addresses the mongo shell and its interface:
The use case documents provide introductions to the patterns, design, and operation used in application development with MongoDB. Each document provides more concrete examples and implementation details to support core MongoDB use cases. These documents highlight application design, and data modeling strategies (i.e. schema design) for MongoDB with special attention to pragmatic considerations including indexing, performance, sharding, and scaling. Each document is distinct and can stand alone; however, each section builds on a set of common topics.
The operational intelligence case studies describe applications that collect machine generated data from logging systems, application output, and other systems. The product data management case studies address aspects of applications required for building product catalogs, and managing inventory in e-commerce systems. The content management case studies introduce basic patterns and techniques for building content management systems using MongoDB.
Finally, the introductory application development tutorials with Python and MongoDB, provides a complete and fully developed application that you can build using MongoDB and popular Python web development tool kits.
As an introduction to the use of MongoDB for operational intelligence and real time analytics use, “Storing Log Data” document describes several ways and approaches to modeling and storing machine generated data with MongoDB. Then, “Pre-Aggregated Reports” describes methods and strategies for processing data to generate aggregated reports from raw event-data. Finally “Hierarchical Aggregation” presents a method for using MongoDB to process and store hierarchical reports (i.e. per-minute, per-hour, and per-day) from raw event data.
This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for log data from servers and other machine data.
Servers generate a large number of events (i.e. logging,) that contain useful information about their operation including errors, warnings, and users behavior. By default, most servers, store these data in plain text log files on their local file systems.
While plain-text logs are accessible and human-readable, they are difficult to use, reference, and analyze without holistic systems for aggregating and storing these data.
The solution described below assumes that each server generates events also consumes event data and that each server can access the MongoDB instance. Furthermore, this design assumes that the query rate for this logging data is substantially lower than common for logging applications with a high-bandwidth event stream.
Note
This case assumes that you’re using an standard uncapped collection for this event data, unless otherwise noted. See the section on capped collections
The schema for storing log data in MongoDB depends on the format of the event data that you’re storing. For a simple example, consider standard request logs in the combined format from the Apache HTTP Server. A line from these logs may resemble the following:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)"
The simplest approach to storing the log data would be putting the exact text of the log record into a document:
{
_id: ObjectId('4f442120eb03305789000000'),
line: '127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)"'
}
While this solution is does capture all data in a format that MongoDB can use, the data is not particularly useful, or it’s not terribly efficient: if you need to find events that the same page, you would need to use a regular expression query, which would require a full scan of the collection. The preferred approach is to extract the relevant information from the log data into individual fields in a MongoDB document.
When you extract data from the log into fields, pay attention to the data types you use to render the log data into MongoDB.
As you design this schema, be mindful that the data types you use to encode the data can have a significant impact on the performance and capability of the logging system. Consider the date field: In the above example, [10/Oct/2000:13:55:36 -0700] is 28 bytes long. If you store this with the UTC timestamp type, you can convey the same information in only 8 bytes.
Additionally, using proper types for your data also increases query flexibility: if you store date as a timestamp you can make date range queries, whereas it’s very difficult to compare two strings that represent dates. The same issue holds for numeric fields; storing numbers as strings requires more space and is difficult to query.
Consider the following document that captures all data from the above log entry:
{
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
logname: null,
user: 'frank',
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
request: "GET /apache_pb.gif HTTP/1.0",
status: 200,
response_size: 2326,
referrer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
}
When extracting data from logs and designing a schema, also consider what information you can omit from your log tracking system. In most cases there’s no need to track all data from an event log, and you can omit other fields. To continue the above example, here the most crucial information may be the host, time, path, user agent, and referrer, as in the following example document:
{
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
}
You may also consider omitting explicit time fields, because the ObjectId embeds creation time:
{
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
path: "/apache_pb.gif",
referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
}
The primary performance concern for event logging systems are:
how many inserts per second can it support, which limits the event throughput, and
how will the system manage the growth of event data, particularly concerning a growth in insert activity.
In most cases the best way to increase the capacity of the system is to use an architecture with some sort of partitioning or sharding that distributes writes among a cluster of systems.
Insertion speed is the primary performance concern for an event logging system. At the same time, the system must be able to support flexible queries so that you can return data from the system efficiently. This section describes procedures for both document insertion and basic analytics queries.
The examples that follow use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
MongoDB has a configurable write concern. This capability allows you to balance the importance of guaranteeing that all writes are fully recorded in the database with the speed of the insert.
For example, if you issue writes to MongoDB and do not require that the database issue any response, the write operations will return very fast (i.e. asynchronously,) but you cannot be certain that all writes succeeded. Conversely, if you require that MongoDB acknowledge every write operation, the database will not return as quickly but you can be certain that every item will be present in the database.
The proper write concern is often an application specific decision, and depends on the reporting requirements and uses of your analytics application.
The following example contains the setup for a Python console session using PyMongo, with an event from the Apache Log:
>>> import bson
>>> import pymongo
>>> from datetime import datetime
>>> conn = pymongo.Connection()
>>> db = conn.event_db
>>> event = {
... _id: bson.ObjectId(),
... host: "127.0.0.1",
... time: datetime(2000,10,10,20,55,36),
... path: "/apache_pb.gif",
... referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
... user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
...}
The following command will insert the event object into the events collection.
>>> db.events.insert(event, w=0)
By setting w=0, you do not require that MongoDB acknowledges receipt of the insert. Although very fast, this is risky because the application cannot detect network and server failures. See Write Concern for more information.
If you want to ensure that MongoDB acknowledges inserts, you can pass w=1 argument as follows:
>>> db.events.insert(event, w=1)
MongoDB also supports a more stringent level of write concern, if you have a lower tolerance for data loss:
You can ensure that MongoDB not only acknowledge receipt of the message but also commit the write operation to the on-disk journal before returning successfully to the application, use can use the following insert() operation:
>>> db.events.insert(event, j=True)
Note
j=True implies w=1.
Finally, if you have extremely low tolerance for event data loss, you can require that MongoDB replicate the data to multiple secondary replica set members before returning:
>>> db.events.insert(event, w=majority)
This will force your application to acknowledge that the data has replicated to a majority of configured members of the replica set. You can combine options as well:
>>> db.events.insert(event, j=True, w=majority)
In this case, your application will wait for a successful journal commit on the primary and a replication acknowledgment from a majority of configured secondaries. This is the safest option presented in this section, but it is the slowest. There is always a trade-off between safety and speed.
Note
If possible, consider using bulk inserts to insert event data.
All write concern options apply to bulk inserts, but you can pass multiple events to the insert() method at once. Batch inserts allow MongoDB to distribute the performance penalty incurred by more stringent write concern across a group of inserts.
See also
The value in maintaining a collection of event data derives from being able to query that data to answer specific questions. You may have a number of simple queries that you may use to analyze these data.
As an example, you may want to return all of the events associated with specific value of a field. Extending the Apache access log example from above, a common case would be to query for all events with a specific value in the path field: This section contains a pattern for returning data and optimizing this operation.
Use a query that resembles the following to return all documents with the /apache_pb.gif value in the path field:
>>> q_events = db.events.find({'path': '/apache_pb.gif'})
Adding an index on the path field would significantly enhance the performance of this operation.
>>> db.events.ensure_index('path')
Because the values of the path likely have a random distribution, in order to operate efficiently, the entire index should be resident in RAM. In this case, the number of distinct paths is typically small in relation to the number of documents, which will limit the space that the index requires.
If your system has a limited amount of RAM, or your data set has a wider distribution in values, you may need to re investigate your indexing support. In most cases, however, this index is entirely sufficient.
See also
The db.collection.ensureIndex() JavaScript method and the db.events.ensure_index() method in PyMongo.
The next example describes the process for returning all the events for a particular date.
To retrieve this data, use the following query:
>>> q_events = db.events.find('time':
... { '$gte':datetime(2000,10,10),'$lt':datetime(2000,10,11)})
In this case, an index on the time field would optimize performance:
>>> db.events.ensure_index('time')
Because your application is inserting events in order, the parts of the index that capture recent events will always be in active RAM. As a result, if you query primarily on recent data, MongoDB will be able to maintain a large index, quickly fulfill queries, and avoid using much system memory.
See also
The db.events.ensureIndex() JavaScript method and the db.events.ensure_index() method in PyMongo.
The following example describes a more complex query for returning all events in the collection for a particular host on a particular date. This kinds analysis may be useful for investigating suspicious behavior by a specific user.
Use a query that resembles the following:
>>> q_events = db.events.find({
... 'host': '127.0.0.1',
... 'time': {'$gte':datetime(2000,10,10),'$lt':datetime(2000,10,11)}
... })
This query selects documents from the events collection where the host field is 127.0.0.1 (i.e. local host), and the value of the time field represents a date that is on or after (i.e. $gte) 2000-10-10 but before (i.e. $lt) 2000-10-11.
The indexes you use may have significant implications for the performance of these kinds of queries. For instance, you can create a compound index on the time and host field, using the following command:
>>> db.events.ensure_index([('time', 1), ('host', 1)])
To analyze the performance for the above query using this index, issue the q_events.explain() method in a Python console. This will return something that resembles:
{ ...
u'cursor': u'BtreeCursor time_1_host_1',
u'indexBounds': {u'host': [[u'127.0.0.1', u'127.0.0.1']],
u'time': [
[ datetime.datetime(2000, 10, 10, 0, 0),
datetime.datetime(2000, 10, 11, 0, 0)]]
},
...
u'millis': 4,
u'n': 11,
u'nscanned': 1296,
u'nscannedObjects': 11,
... }
This query had to scan 1296 items from the index to return 11 objects in 4 milliseconds. Conversely, you can test a different compound index with the host field first, followed by the time field. Create this index using the following operation:
>>> db.events.ensure_index([('host', 1), ('time', 1)])
Use the q_events.explain() operation to test the performance:
{ ...
u'cursor': u'BtreeCursor host_1_time_1',
u'indexBounds': {u'host': [[u'127.0.0.1', u'127.0.0.1']],
u'time': [[datetime.datetime(2000, 10, 10, 0, 0),
datetime.datetime(2000, 10, 11, 0, 0)]]},
...
u'millis': 0,
u'n': 11,
...
u'nscanned': 11,
u'nscannedObjects': 11,
...
}
Here, the query had to scan 11 items from the index before returning 11 objects in less than a millisecond. By placing the more selective element of your query first in a compound index you may be able to build more useful queries.
Note
Although the index order has an impact query performance, remember that index scans are much faster than collection scans, and depending on your other queries, it may make more sense to use the { time: 1, host: 1 } index depending on usage profile.
See also
The db.events.ensureIndex() JavaScript method and the db.events.ensure_index() method in PyMongo.
The following example describes the process for using the collection of Apache access events to determine the number of request per resource (i.e. page) per day in the last month.
New in version 2.1.
The aggregation framework provides the capacity for queries that select, process, and aggregate results from large numbers of documents. The aggregate() (and aggregate command) offers greater flexibility, capacity with less complexity than the existing mapReduce and group aggregation commands.
Consider the following aggregation pipeline: [1]
>>> result = db.command('aggregate', 'events', pipeline=[
... { '$match': {
... 'time': {
... '$gte': datetime(2000,10,1),
... '$lt': datetime(2000,11,1) } } },
... { '$project': {
... 'path': 1,
... 'date': {
... 'y': { '$year': '$time' },
... 'm': { '$month': '$time' },
... 'd': { '$dayOfMonth': '$time' } } } },
... { '$group': {
... '_id': {
... 'p':'$path',
... 'y': '$date.y',
... 'm': '$date.m',
... 'd': '$date.d' },
... 'hits': { '$sum': 1 } } },
... ])
This command aggregates documents from the events collection with a pipeline that:
Uses the $match to limit the documents that the aggregation framework must process. $match is similar to a find() query.
This operation selects all documents where the value of the time field represents a date that is on or after (i.e. $gte) 2000-10-10 but before (i.e. $lt) 2000-10-11.
Uses the $project to limit the data that continues through the pipeline. This operator:
Uses the $group to create new computed documents. This step will create a single new document for each unique path/date combination. The documents take the following form:
Note
In sharded environments, the performance of aggregation operations depends on the shard key. Ideally, all the items in a particular $group operation will reside on the same server.
While this distribution of documents would occur if you chose the time field as the shard key, a field like path also has this property and is a typical choice for sharding. Also see the “sharding considerations.” of this document for additional recommendations for using sharding.
See also
| [1] | To translate statements from the aggregation framework to SQL, you can consider the $match equivalent to WHERE, $project to SELECT, and $group to GROUP BY. |
To optimize the aggregation operation, ensure that the initial $match query has an index. Use the following command to create an index on the time field in the events collection:
>>> db.events.ensure_index('time')
Note
If you have already created a compound index on the time and host (i.e. { time: 1, host, 1 },) MongoDB will use this index for range queries on just the time field. Do not create an additional index, in these situations.
Eventually your system’s events will exceed the capacity of a single event logging database instance. In these situations you will want to use a sharded cluster, which takes advantage of MongoDB’s sharding functionality. This section introduces the unique sharding concerns for this event logging case.
See also
“FAQ: Sharding with MongoDB” and the “Sharding wiki page.
In a sharded environment the limitations on the maximum insertion rate are:
Because MongoDB distributed data in using “ranges” (i.e. chunks) of keys, the choice of shard key can control how MongoDB distributes data and the resulting systems’ capacity for writes and queries.
Ideally, your shard key should allow insertions balance evenly among the shards [2] and for most queries to only need to access a single shard. [3] Continue reading for an analysis of a collection of shard key choices.
| [2] | For this reason, avoid shard keys based on the timestamp or the insertion time (i.e. the ObjectId) because all writes will end up on a single node. |
| [3] | For this reason, avoid randomized shard keys (e.g. hash based shard keys) because any query will have to access all shards in the cluster. |
While using the timestamp, or the ObjectId in the _id field, [4] would distribute your data evenly among shards, these keys lead to two problems:
| [4] | The ObjectId derives from the creation time, and is effectively a timestamp in this case. |
To distribute data more evenly among the shards, you may consider using a more “random” piece of data, such as a hash of the _id field (i.e. the ObjectId as a shard key.
While this introduces some additional complexity into your application, to generate the key, it will distribute writes among the shards. In these deployments having 5 shards will provide 5 times the write capacity as a single instance.
Using this shard key, or any hashed value as a key presents the following downsides:
This might be an acceptable trade-off in some situations. The workload of event logging systems tends to be heavily skewed toward writing, read performance may not be as critical as more robust write performance.
| [5] | Typically, it is difficult to use these kinds of shard keys in queries. |
If a field in your documents has values that are evenly distributed among the documents, you may consider using this key as a shard key.
Continuing the example from above, you may consider using the path field. Which may have a couple of advantages:
There are a few potential problems with these kinds of shard keys:
Note
Test using your existing data to ensure that the distribution is truly even, and that there is a sufficient quantity of distinct values for the shard key.
MongoDB supports compound shard keys that combine the best aspects of sharding by a evenly distributed key in the set and sharding by a random key. In these situations, the shard key would resemble { path: 1 , ssk: 1 } where, path is an often used “natural key, or value from your data and ssk is a hash of the _id field. [6]
Using this type of shard key, data is largely distributed by the natural key, or path, which makes most queries that access the path field local to a single shard or group of shards. At the same time, if there is not sufficient distribution for specific values of path, the ssk makes it possible for MongoDB to create chunks and data across the cluster.
In most situations, these kinds of keys provide the ideal balance between distributing writes across the cluster and ensuring that most queries will only need to access a select number of shards.
| [6] | You must still calculate the value of this synthetic key in your application when you insert documents into your collection. |
Selecting shard keys is difficult because: there are no definitive “best-practices,” the decision has a large impact on performance, and it is difficult or impossible to change the shard key after making the selection.
The sharding options provides a good starting point for thinking about shard key selection. Nevertheless, the best way to select a shard key is to analyze the actual insertions and queries from your own application.
Without some strategy for managing the size of your database, most event logging systems can grow infinitely. This is particularly important in the context of MongoDB may not relinquish data to the file system in the way you might expect. Consider the following strategies for managing data growth:
Depending on your data retention requirements as well as your reporting and analytics needs, you may consider using a capped collection to store your events. Capped collections have a fixed size, and drop old data when inserting new data after reaching cap.
Note
In the current version, it is not possible to shard capped collections.
Strategy: Periodically rename your event collection so that your data collection rotates in much the same way that you might rotate log files. When needed, you can drop the oldest collection from the database.
This approach has several advantages over the single collection approach:
Nevertheless, this operation may increase some complexity for queries, if any of your analyses depend on events that may reside in the current and previous collection. For most real time data collection systems, this approach is the most ideal.
Strategy: Rotate databases rather than collections, as in the “Multiple Collections, Single Database example.
While this significantly increases application complexity for insertions and queries, when you drop old databases, MongoDB will return disk space to the file system. This approach makes the most sense in scenarios where your event insertion rates and/or your data retention rates were extremely variable.
For example, if you are performing a large backfill of event data and want to make sure that the entire set of event data for 90 days is available during the backfill, during normal operations you only need 30 days of event data, you might consider using multiple databases.
This document outlines the basic patterns and principles for using MongoDB as an engine for collecting and processing events in real time for use in generating up to the minute or second reports.
Servers and other systems can generate a large number of documents, and it can be difficult to access and analyze such large collections of data originating from multiple servers.
This document makes the following assumptions about real-time analytics:
See also
The solution described below assumes a simple scenario using data from web server access logs. With this data, you will want to return the number of hits to a collection of web sites at various levels of granularity based on time (i.e. by minute, hour, day, week, and month) as well as by the path of a resource.
To achieve the required performance to support these tasks, upserts and increment operations will allow you to calculate statistics, produce simple range-based queries, and generate filters to support time-series charts of aggregated data.
Schemas for real-time analytics systems must support simple and fast query and update operations. In particular, attempt to avoid the following situations which can degrade performance:
documents growing significantly after creation.
Document growth forces MongoDB to move the document on disk, which can be time and resource consuming relative to other operations;
queries requiring MongoDB to scan documents in the collection without using indexes; and
deeply nested documents that make accessing particular fields slow.
Intuitively, you may consider keeping “hit counts” in individual documents with one document for every unit of time (i.e. minute, hour, day, etc.) However, queries must return multiple documents for all non-trivial time-rage queries, which can slow overall query performance.
Preferably, to maximize query performance, use more complex documents, and keep several aggregate values in each document. The remainder of this section outlines several schema designs that you may consider for this real-time analytics system. While there is no single pattern for every problem, each pattern is more well suited to specific classes of problems.
Consider the following example schema for a solution that stores all statistics for a single day and page in a single document:
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: 5468426,
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": 3612,
"1": 3241,
...
"1439": 2819 }
}
This approach has a couple of advantages:
There are, however, significant issues with this approach. The most significant issue is that, as you upsert data into the hourly and monthly fields, the document grows. Although MongoDB will pad the space allocated to documents, it must still will need to reallocate these documents multiple times throughout the day, which impacts performance.
To mitigate the impact of repeated document migrations throughout the day, you can tweak the “one document per page per day” approach by adding a process that “pre-allocates” documents with fields that hold 0 values throughout the previous day. Thus, at midnight, new documents will exist.
Note
To avoid situations where your application must pre-allocate large numbers of documents at midnight, it’s best to create documents throughout the previous day by upserting randomly when you update a value in the current day’s data.
This requires some tuning, to balance two requirements:
As a starting point, consider the average number of hits a day (h), and then upsert a blank document upon update with a probability of 1/h.
Pre-allocating increases performance by initializing all documents with 0 values in all fields. After create, documents will never grow. This means that:
Note
MongoDB stores BSON documents as a sequence of fields and values, not as a hash table. As a result, writing to the field stats.mn.0 is considerably faster than writing to stats.mn.1439.
In order to update the value in minute #1349, MongoDB must skip over all 1349 entries before it.
To optimize update and insert operations you can introduce intra-document hierarchy. In particular, you can split the minute field up into 24 hourly fields:
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: 5468426,
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": {
"0": 3612,
"1": 3241,
...
"59": 2130 },
"1": {
"60": ... ,
},
...
"23": {
...
"1439": 2819 }
}
}
This allows MongoDB to “skip forward” throughout the day when updating the minute data, which makes the update performance more uniform and faster later in the day.
To update the value in minute #1349, MongoDB first skips the first 23 hours and then skips 59 minutes for only 82 skips as opposed to 1439 skips in the previous schema.
Pre-allocating documents is a reasonable design for storing intra-day data, but the model breaks down when displaying data over longer multi-day periods like months or quarters. In these cases, consider storing daily statistics in a single document as above, and then aggregate monthly data into a separate document.
This introduce a second set of upsert operations to the data collection and aggregation portion of your application but the gains reduction in disk seeks on the queries, should be worth the costs. Consider the following example schema:
Daily Statistics
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": {
"0": 3612,
"1": 3241,
...
"59": 2130 },
"1": {
"0": ...,
},
...
"23": {
"59": 2819 }
}
}
Monthly Statistics
{
_id: "201010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-00T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: {
"1": 5445326,
"2": 5214121,
... }
}
This section outlines a number of common operations for building and interacting with real-time-analytics reporting system. The major challenge is in balancing performance and write (i.e. upsert) performance. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
Logging an event such as a page request (i.e. “hit”) is the main “write” activity for your system. To maximize performance, you’ll be doing in-place updates with the upsert operation. Consider the following example:
from datetime import datetime, time
def log_hit(db, dt_utc, site, page):
# Update daily stats doc
id_daily = dt_utc.strftime('%Y%m%d/') + site + page
hour = dt_utc.hour
minute = dt_utc.minute
# Get a datetime that only includes date info
d = datetime.combine(dt_utc.date(), time.min)
query = {
'_id': id_daily,
'metadata': { 'date': d, 'site': site, 'page': page } }
update = { '$inc': {
'hourly.%d' % (hour,): 1,
'minute.%d.%d' % (hour,minute): 1 } }
db.stats.daily.update(query, update, upsert=True)
# Update monthly stats document
id_monthly = dt_utc.strftime('%Y%m/') + site + page
day_of_month = dt_utc.day
query = {
'_id': id_monthly,
'metadata': {
'date': d.replace(day=1),
'site': site,
'page': page } }
update = { '$inc': {
'daily.%d' % day_of_month: 1} }
db.stats.monthly.update(query, update, upsert=True)
The upsert operation (i.e. upsert=True) performs an update if the document exists, and an insert if the document does not exist.
Note
This application requires upserts, because the pre-allocation method only pre-allocates new documents with a high probability, not with complete certainty.
Without preallocation, you end up with a dynamically growing document, slowing upserts as MongoDB moves documents to accommodate growth.
To prevent document growth, you can preallocate new documents before the system needs them. As you create new documents, set all values to 0 for so that documents will not grow to accommodate updates. Consider the following preallocate() function:
def preallocate(db, dt_utc, site, page):
# Get id values
id_daily = dt_utc.strftime('%Y%m%d/') + site + page
id_monthly = dt_utc.strftime('%Y%m/') + site + page
# Get daily metadata
daily_metadata = {
'date': datetime.combine(dt_utc.date(), time.min),
'site': site,
'page': page }
# Get monthly metadata
monthly_metadata = {
'date': daily_m['d'].replace(day=1),
'site': site,
'page': page }
# Initial zeros for statistics
hourly = dict((str(i), 0) for i in range(24))
minute = dict(
(str(i), dict((str(j), 0) for j in range(60)))
for i in range(24))
daily = dict((str(i), 0) for i in range(1, 32))
# Perform upserts, setting metadata
db.stats.daily.update(
{
'_id': id_daily,
'hourly': hourly,
'minute': minute},
{ '$set': { 'metadata': daily_metadata }},
upsert=True)
db.stats.monthly.update(
{
'_id': id_monthly,
'daily': daily },
{ '$set': { 'm': monthly_metadata }},
upsert=True)
The function pre-allocated both the monthly and daily documents at the same time. The performance benefits from separating these operations are negligible, so it’s reasonable to keep both operations in the same function.
Ideally, your application should pre-allocate documents before needing to write data to maintain consistent update performance. Additionally, its important to avoid causing a spike in activity and latency by creating documents all at once.
In the following example, document updates (i.e. “log_hit()”) will also pre-allocate a document probabilistically. However, by “tuning probability,” you can limit redundant preallocate() calls.
from random import random
from datetime import datetime, timedelta, time
# Example probability based on 500k hits per day per page
prob_preallocate = 1.0 / 500000
def log_hit(db, dt_utc, site, page):
if random.random() < prob_preallocate:
preallocate(db, dt_utc + timedelta(days=1), site_page)
# Update daily stats doc
...
Using this method, there will be a high probability that each document will already exist before your application needs to issue update operations. You’ll also be able to prevent a regular spike in activity for pre-allocation, and be able to eliminate document growth.
This example describes fetching the data from the above MongoDB system, for use in generating a chart that displays the number of hits to a particular resource over the last hour.
Use the following query in a find_one operation at the Python/PyMongo console to retrieve the number of hits to a specific resource (i.e. /index.html) with minute-level granularity:
>>>``db.stats.daily.find_one(
... {'metadata': {'date':dt, 'site':'site-1', 'page':'/index.html'}},
... { 'minute': 1 })
Use the following query to retrieve the number of hits to a resource over the last day, with hour-level granularity:
>>> db.stats.daily.find_one(
... {'metadata': {'date':dt, 'site':'site-1', 'page':'/foo.gif'}},
... { 'hy': 1 })
If you want a few days of hourly data, you can use a query in the following form:
>>> db.stats.daily.find(
... {
... 'metadata.date': { '$gte': dt1, '$lte': dt2 },
... 'metadata.site': 'site-1',
... 'metadata.page': '/index.html'},
... { 'metadata.date': 1, 'hourly': 1 } },
... sort=[('metadata.date', 1)])
To support these query operation, create a compound index on the following daily statistics fields: metadata.site, metadata.page, and metadata.date (in that order.) Use the following operation at the Python/PyMongo console.
>>> db.stats.daily.ensure_index([
... ('metadata.site', 1),
... ('metadata.page', 1),
... ('metadata.date', 1)])
This index makes it possible to efficiently run the query for multiple days of hourly data. At the same time, any compound index on page and date, will allow you to query efficiently for a single day’s statistics.
To retrieve daily data for a single month, use the following query:
>>> db.stats.monthly.find_one(
... {'metadata':
... {'date':dt,
... 'site': 'site-1',
... 'page':'/index.html'}},
... { 'daily': 1 })
To retrieve several months of daily data, use a variation on the above query:
>>> db.stats.monthly.find(
... {
... 'metadata.date': { '$gte': dt1, '$lte': dt2 },
... 'metadata.site': 'site-1',
... 'metadata.page': '/index.html'},
... { 'metadata.date': 1, 'daily': 1 } },
... sort=[('metadata.date', 1)])
Create the following index to support these queries for monthly data on the metadata.site, metadata.page, and metadata.date fields:
>>> db.stats.monthly.ensure_index([
... ('metadata.site', 1),
... ('metadata.page', 1),
... ('metadata.date', 1)])
This field order will efficiently support range queries for a single page over several months.
The only potential limits on the performance of this system are the number of shards in your system, and the shard key that you use.
An ideal shard key will distribute upserts between the shards while routing all queries to a single shard, or a small number of shards.
While your choice of shard key may depend on the precise workload of your deployment, consider using { metadata.site: 1, metadata.page: 1 } as a shard key. The combination of site and page (or event) will lead to a well balanced cluster for most deployments.
Enable sharding for the daily statistics collection with the following shardCollection command in the Python/PyMongo console:
>>> db.command('shardCollection', 'stats.daily', {
... key : { 'metadata.site': 1, 'metadata.page' : 1 } })
Upon success, you will see the following response:
{ "collectionsharded" : "stats.daily", "ok" : 1 }
Enable sharding for the monthly statistics collection with the following shardCollection command in the Python/PyMongo console:
>>> db.command('shardCollection', 'stats.monthly', {
... key : { 'metadata.site': 1, 'metadata.page' : 1 } })
Upon success, you will see the following response:
{ "collectionsharded" : "stats.monthly", "ok" : 1 }
One downside of the { metadata.site: 1, metadata.page: 1 } shard key is: if one page dominates all your traffic, all updates to that page will go to a single shard. This is basically unavoidable, since all update for a single page are going to a single document.
You may wish to include the date in addition to the site, and page fields so that MongoDB can split histories so that you can serve different historical ranges with different shards. Use the following shardCollection command to shard the daily statistics collection in the Python/PyMongo console:
>>> db.command('shardCollection', 'stats.daily', {
... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}})
{ "collectionsharded" : "stats.daily", "ok" : 1 }
Enable sharding for the monthly statistics collection with the following shardCollection command in the Python/PyMongo console:
>>> db.command('shardCollection', 'stats.monthly', {
... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}})
{ "collectionsharded" : "stats.monthly", "ok" : 1 }
Note
Determine your actual requirements and load before deciding to shard. In many situations a single MongoDB instance may be able to keep track of all events and pages.
If you collect a large amount of data, but do not pre-aggregate, and you want to have access to aggregated information and reports, then you need a method to aggregate these data into a usable form. This document provides an overview of these aggregation patterns and processes.
For clarity, this case study assumes that the incoming event data resides in a collection named events. For details on how you might get the event data into the events collection, please see “Storing Log Data” document. This document continues using this example.
The first step in the aggregation process is to aggregate event data into the finest required granularity. Then use this aggregation to generate the next least specific level granularity and this repeat process until you have generated all required views.
The solution uses several collections: the raw data (i.e. events) collection as well as collections for aggregated hourly, daily, weekly, monthly, and yearly statistics. All aggregations use the mapReduce command, in a hierarchical process. The following figure illustrates the input and output of each job:
Hierarchy of data aggregation.
Note
Aggregating raw events into an hourly collection is qualitatively different from the operation that aggregates hourly statistics into the daily collection.
See also
Map-reduce and the MapReduce wiki page for more information on the Map-reduce data aggregation paradigm.
When designing the schema for event storage, it’s important to track the events included in the aggregation and events that are not yet included.
Relational Approach
A simple tactic from relational database, uses an auto-incremented integer as the primary key. However, this introduces a significant performance penalty for event logging process because the aggregation process must fetch new keys one at a time.
If you can batch your inserts into the events collection, you can use an auto-increment primary key by using the find_and_modify command to generate the _id values, as in the following example:
>>> obj = db.my_sequence.find_and_modify(
... query={'_id':0},
... update={'$inc': {'inc': 50}}
... upsert=True,
... new=True)
>>> batch_of_ids = range(obj['inc']-50, obj['inc'])
However, in most cases you can simply include a timestamp with each event that you can use to distinguish processed events from unprocessed events.
This example assumes that you are calculating average session length for logged-in users on a website. The events will have the following form:
{
"userid": "rick",
"ts": ISODate('2010-10-10T14:17:22Z'),
"length":95
}
The operations described in the next session will calculate total and average session times for each user at the hour, day, week, month and year. For each aggregation you will want to store the number of sessions so that MongoDB can incrementally recompute the average session times. The aggregate document will resemble the following:
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z") },
value: {
ts: ISODate('2010-10-10T15:01:00Z'),
total: 254,
count: 10,
mean: 25.4 }
}
Note
The timestamp value in the _id sub-document, which will allow you to incrementally update documents at various levels of the hierarchy.
This section assumes that all events exist in the events collection and have a timestamp. The operations, thus are to aggregate from the events collection into the smallest aggregate–hourly totals– and then aggregate from the hourly totals into coarser granularity levels. In all cases, these operations will store aggregation time as a last_run variable.
Note
Although this solution uses Python and PyMongo to connect with MongoDB, you must pass JavaScript functions (i.e. mapf, reducef, and finalizef) to the mapReduce command.
Begin by creating a map function, as below:
mapf_hour = bson.Code('''function() {
var key = {
u: this.userid,
d: new Date(
this.ts.getFullYear(),
this.ts.getMonth(),
this.ts.getDate(),
this.ts.getHours(),
0, 0, 0);
emit(
key,
{
total: this.length,
count: 1,
mean: 0,
ts: new Date(); });
}''')
In this case, it emits key-value pairs that contain the data you want to aggregate as you’d expect. The function also emits a ts value that makes it possible to cascade aggregations to coarser grained aggregations (i.e. hour to day, etc.)
Consider the following reduce function:
reducef = bson.Code('''function(key, values) {
var r = { total: 0, count: 0, mean: 0, ts: null };
values.forEach(function(v) {
r.total += v.total;
r.count += v.count;
});
return r;
}''')
The reduce function returns a document in the same format as the output of the map function. This pattern for map and reduce functions makes map-reduce processes easier to test and debug.
While the reduce function ignores the mean and ts (timestamp) values, the finalize step, as follows, computes these data:
finalizef = bson.Code('''function(key, value) {
if(value.count > 0) {
value.mean = value.total / value.count;
}
value.ts = new Date();
return value;
}''')
With the above function the map_reduce operation itself will resemble the following:
cutoff = datetime.utcnow() - timedelta(seconds=60)
query = { 'ts': { '$gt': last_run, '$lt': cutoff } }
db.events.map_reduce(
map=mapf_hour,
reduce=reducef,
finalize=finalizef,
query=query,
out={ 'reduce': 'stats.hourly' })
last_run = cutoff
The cutoff variable allows you to process all events that have occurred since the last run but before 1 minute ago. This allows for some delay in logging events. You can safely run this aggregation as often as you like, provided that you update the last_run variable each time.
Create an index on the timestamp (i.e. the ts field) to support the query selection of the map_reduce operation. Use the following operation at the Python/PyMongo console:
>>> db.events.ensure_index('ts')
To calculate daily statistics, use the hourly statistics as input. Begin with the following map function:
mapf_day = bson.Code('''function() {
var key = {
u: this._id.u,
d: new Date(
this._id.d.getFullYear(),
this._id.d.getMonth(),
this._id.d.getDate(),
0, 0, 0, 0) };
emit(
key,
{
total: this.value.total,
count: this.value.count,
mean: 0,
ts: null });
}''')
The map function for deriving day-level data differs from the initial aggregation above in the following ways:
the aggregation key is the (userid, date) rather than (userid, hour) to support daily aggregation.
the keys and values emitted (i.e. emit()) are actually the total and count values from the hourly aggregates rather than properties from event documents.
This is the case for all the higher-level aggregation operations.
Because the output of this map function is the same as the previous map function, you can use the same reduce and finalize functions.
The actual code driving this level of aggregation is as follows:
cutoff = datetime.utcnow() - timedelta(seconds=60)
query = { 'value.ts': { '$gt': last_run, '$lt': cutoff } }
db.stats.hourly.map_reduce(
map=mapf_day,
reduce=reducef,
finalize=finalizef,
query=query,
out={ 'reduce': 'stats.daily' })
last_run = cutoff
There are a couple of things to note here. First of all, the query is not on ts now, but value.ts, the timestamp written during the finalization of the hourly aggregates. Also note that you are, in fact, aggregating from the stats.hourly collection into the stats.daily collection.
Because you will run the query option regularly which finds on the value.ts field, you may wish to create an index to support this. Use the following operation in the Python/PyMongo shell to create this index:
>>> db.stats.hourly.ensure_index('value.ts')
You can use the aggregated day-level data to generate weekly and monthly statistics. A map function for generating weekly data follows:
mapf_week = bson.Code('''function() {
var key = {
u: this._id.u,
d: new Date(
this._id.d.valueOf()
- dt.getDay()*24*60*60*1000) };
emit(
key,
{
total: this.value.total,
count: this.value.count,
mean: 0,
ts: null });
}''')
Here, to get the group key, the function takes the current and subtracts days until you get the beginning of the week. In the weekly map function, you’ll use the first day of the month as the group key, as follows:
mapf_month = bson.Code('''function() {
d: new Date(
this._id.d.getFullYear(),
this._id.d.getMonth(),
1, 0, 0, 0, 0) };
emit(
key,
{
total: this.value.total,
count: this.value.count,
mean: 0,
ts: null });
}''')
These map functions are identical to each other except for the date calculation.
Create additional indexes to support the weekly and monthly aggregation options on the value.ts field. Use the following operation in the Python/PyMongo shell.
>>> db.stats.daily.ensure_index('value.ts')
>>> db.stats.monthly.ensure_index('value.ts')
Use Python’s string interpolation to refactor the map function definitions as follows:
mapf_hierarchical = '''function() {
var key = {
u: this._id.u,
d: %s };
emit(
key,
{
total: this.value.total,
count: this.value.count,
mean: 0,
ts: null });
}'''
mapf_day = bson.Code(
mapf_hierarchical % '''new Date(
this._id.d.getFullYear(),
this._id.d.getMonth(),
this._id.d.getDate(),
0, 0, 0, 0)''')
mapf_week = bson.Code(
mapf_hierarchical % '''new Date(
this._id.d.valueOf()
- dt.getDay()*24*60*60*1000)''')
mapf_month = bson.Code(
mapf_hierarchical % '''new Date(
this._id.d.getFullYear(),
this._id.d.getMonth(),
1, 0, 0, 0, 0)''')
mapf_year = bson.Code(
mapf_hierarchical % '''new Date(
this._id.d.getFullYear(),
1, 1, 0, 0, 0, 0)''')
You can create a h_aggregate function to wrap the map_reduce operation, as below, to reduce code duplication:
def h_aggregate(icollection, ocollection, mapf, cutoff, last_run):
query = { 'value.ts': { '$gt': last_run, '$lt': cutoff } }
icollection.map_reduce(
map=mapf,
reduce=reducef,
finalize=finalizef,
query=query,
out={ 'reduce': ocollection.name })
With h_aggregate defined, you can perform all aggregation operations as follows:
cutoff = datetime.utcnow() - timedelta(seconds=60)
h_aggregate(db.events, db.stats.hourly, mapf_hour, cutoff, last_run)
h_aggregate(db.stats.hourly, db.stats.daily, mapf_day, cutoff, last_run)
h_aggregate(db.stats.daily, db.stats.weekly, mapf_week, cutoff, last_run)
h_aggregate(db.stats.daily, db.stats.monthly, mapf_month, cutoff, last_run)
h_aggregate(db.stats.monthly, db.stats.yearly, mapf_year, cutoff, last_run)
last_run = cutoff
As long as you save and restore the last_run variable between aggregations, you can run these aggregations as often as you like since each aggregation operation is incremental.
Ensure that you choose a shard key that is not the incoming timestamp, but rather something that varies significantly in the most recent documents. In the example above, consider using the userid as the most significant part of the shard key.
To prevent a single, active user from creating a large, chunk that MongoDB cannot split, use a compound shard key with (username, timestamp) on the events collection. Consider the following:
>>> db.command('shardCollection','events', {
... 'key' : { 'userid': 1, 'ts' : 1} } )
{ "collectionsharded": "events", "ok" : 1 }
To shard the aggregated collections you must use the _id field, so you can issue the following group of shard operations in the Python/PyMongo shell:
db.command('shardCollection', 'stats.daily', {
'key': { '_id': 1 } })
db.command('shardCollection', 'stats.weekly', {
'key': { '_id': 1 } })
db.command('shardCollection', 'stats.monthly', {
'key': { '_id': 1 } })
db.command('shardCollection', 'stats.yearly', {
'key': { '_id': 1 } })
You should also update the h_aggregate map-reduce wrapper to support sharded output Add 'sharded':True to the out argument. See the full sharded h_aggregate function:
def h_aggregate(icollection, ocollection, mapf, cutoff, last_run):
query = { 'value.ts': { '$gt': last_run, '$lt': cutoff } }
icollection.map_reduce(
map=mapf,
reduce=reducef,
finalize=finalizef,
query=query,
out={ 'reduce': ocollection.name, 'sharded': True })
MongoDB’s flexible schema makes it particularly well suited to storing information for product data management and e-commerce websites and solutions. The “Product Catalog” document describes methods and practices for modeling and managing a product catalog using MongoDB, while the “Inventory Management” document introduces a pattern for handling interactions between inventory and users’ shopping carts. Finally the “Category Hierarchy” document describes methods for interacting with category hierarchies in MongoDB.
This document describes the basic patterns and principles for designing an E-Commerce product catalog system using MongoDB as a storage engine.
Product catalogs must have the capacity to store many differed types of objects with different sets of attributes. These kinds of data collections are quite compatible with MongoDB’s data model, but many important considerations and design decisions remain.
For relational databases, there are several solutions that address this problem, each with a different performance profile. This section examines several of these options and then describes the preferred MongoDB solution.
One approach, in a relational model, is to create a table for each product category. Consider the following example SQL statement for creating database tables:
CREATE TABLE `product_audio_album` (
`sku` char(8) NOT NULL,
...
`artist` varchar(255) DEFAULT NULL,
`genre_0` varchar(255) DEFAULT NULL,
`genre_1` varchar(255) DEFAULT NULL,
...,
PRIMARY KEY(`sku`))
...
CREATE TABLE `product_film` (
`sku` char(8) NOT NULL,
...
`title` varchar(255) DEFAULT NULL,
`rating` char(8) DEFAULT NULL,
...,
PRIMARY KEY(`sku`))
...
This approach has limited flexibility for two key reasons:
Another relational data model uses a single table for all product categories and adds new columns anytime you need to store data regarding a new type of product. Consider the following SQL statement:
CREATE TABLE `product` (
`sku` char(8) NOT NULL,
...
`artist` varchar(255) DEFAULT NULL,
`genre_0` varchar(255) DEFAULT NULL,
`genre_1` varchar(255) DEFAULT NULL,
...
`title` varchar(255) DEFAULT NULL,
`rating` char(8) DEFAULT NULL,
...,
PRIMARY KEY(`sku`))
This approach is more flexible than concrete table inheritance: it allows single queries to span different product types, but at the expense of space.
Also in the relational model, you may use a “multiple table inheritance” pattern to represent common attributes in a generic “product” table, with some variations in individual category product tables. Consider the following SQL statement:
CREATE TABLE `product` (
`sku` char(8) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`price`, ...
PRIMARY KEY(`sku`))
CREATE TABLE `product_audio_album` (
`sku` char(8) NOT NULL,
...
`artist` varchar(255) DEFAULT NULL,
`genre_0` varchar(255) DEFAULT NULL,
`genre_1` varchar(255) DEFAULT NULL,
...,
PRIMARY KEY(`sku`),
FOREIGN KEY(`sku`) REFERENCES `product`(`sku`))
...
CREATE TABLE `product_film` (
`sku` char(8) NOT NULL,
...
`title` varchar(255) DEFAULT NULL,
`rating` char(8) DEFAULT NULL,
...,
PRIMARY KEY(`sku`),
FOREIGN KEY(`sku`) REFERENCES `product`(`sku`))
...
Multiple table inheritance is more space-efficient than single table inheritance and somewhat more flexible than concrete table inheritance. However, this model does require an expensive JOIN operation to obtain all relevant attributes relevant to a product.
The final substantive pattern from relational modeling is the entity-attribute-value schema where you would create a meta-model for product data. In this approach, you maintain a table with three columns, e.g. entity_id, attribute_id, value, and these triples describe each product.
Consider the description of an audio recording. You may have a series of rows representing the following relationships:
| Entity | Attribute | Value |
|---|---|---|
| sku_00e8da9b | type | Audio Album |
| sku_00e8da9b | title | A Love Supreme |
| sku_00e8da9b | ... | ... |
| sku_00e8da9b | artist | John Coltrane |
| sku_00e8da9b | genre | Jazz |
| sku_00e8da9b | genre | General |
| ... | ... | ... |
This schema is totally flexible:
The downside for these models, is that all nontrivial queries require large numbers of JOIN operations that results in large performance penalties.
Additionally some e-commerce solutions with relational database systems avoid choosing one of the data models above, and serialize all of this data into a BLOB column. While simple, the details become difficult to access for search and sort.
Because MongoDB is a non-relational database, the data model for your product catalog can benefit from this additional flexibility. The best models use a single MongoDB collection to store all the product data, which is similar to the single table inheritance relational model. MongoDB’s dynamic schema means that each document need not conform to the same schema. As a result, the document for each product only needs to contain attributes relevant to that product.
At the beginning of the document, the schema must contain general product information, to facilitate searches of the entire catalog. Then, a details sub-document that contains fields that vary between product types. Consider the following example document for an album product.
{
sku: "00e8da9b",
type: "Audio Album",
title: "A Love Supreme",
description: "by John Coltrane",
asin: "B0000A118M",
shipping: {
weight: 6,
dimensions: {
width: 10,
height: 10,
depth: 1
},
},
pricing: {
list: 1200,
retail: 1100,
savings: 100,
pct_savings: 8
},
details: {
title: "A Love Supreme [Original Recording Reissued]",
artist: "John Coltrane",
genre: [ "Jazz", "General" ],
...
tracks: [
"A Love Supreme Part I: Acknowledgement",
"A Love Supreme Part II - Resolution",
"A Love Supreme, Part III: Pursuance",
"A Love Supreme, Part IV-Psalm"
],
},
}
A movie item would have the same fields for general product information, shipping, and pricing, but have different details sub-document. Consider the following:
{
sku: "00e8da9d",
type: "Film",
...,
asin: "B000P0J0AQ",
shipping: { ... },
pricing: { ... },
details: {
title: "The Matrix",
director: [ "Andy Wachowski", "Larry Wachowski" ],
writer: [ "Andy Wachowski", "Larry Wachowski" ],
...,
aspect_ratio: "1.66:1"
},
}
Note
In MongoDB, you can have fields that hold multiple values (i.e. arrays) without any restrictions on the number of fields or values (as with genre_0 and genre_1) and also without the need for a JOIN operation.
For most deployments the primary use of the product catalog is to perform search operations. This section provides an overview of various types of queries that may be useful for supporting an e-commerce site. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
This query returns the documents for the products of a specific genre, sorted in reverse chronological order:
query = db.products.find({'type':'Audio Album',
'details.genre': 'jazz'})
query = query.sort([('details.issue_date', -1)])
To support this query, create a compound index on all the properties used in the filter and in the sort:
db.products.ensure_index([
('type', 1),
('details.genre', 1),
('details.issue_date', -1)])
Note
The final component of the index is the sort field. This allows MongoDB to traverse the index in the sorted order to preclude a slow in-memory sort.
While most searches will be for a particular type of product (e.g album, movie, etc.,) in some situations you may want to return all products in a certain price range, or discount percentage.
To return this data use the pricing information that exists in all products to find the products with the highest percentage discount:
query = db.products.find( { 'pricing.pct_savings': {'$gt': 25 })
query = query.sort([('pricing.pct_savings', -1)])
To support this type of query, you will want to create an index on the pricing.pct_savings field:
db.products.ensure_index('pricing.pct_savings')
Since MongoDB can read indexes in ascending or descending order, the order of the index does not matter.
Note
If you want to preform range queries (e.g. “return all products over $25”) and then sort by another property like pricing.retail, MongoDB cannot use the index as effectively in this situation.
The field that you want to select a range, or perform sort operations, must be the last field in a compound index in order to avoid scanning an entire collection. Using different properties within a single combined range query and sort operation requires some scanning which will limit the speed of your query.
Use the following query to select documents within the details of a specified product type (i.e. Film) of product (a movie) to find products that contain a certain value (i.e. a specific actor in the details.actor field,) with the results sorted by date descending:
query = db.products.find({'type': 'Film',
'details.actor': 'Keanu Reeves'})
query = query.sort([('details.issue_date', -1)])
To support this query, you may want to create the following index.
db.products.ensure_index([
('type', 1),
('details.actor', 1),
('details.issue_date', -1)])
This index begins with the type field and then narrows by the other search field, where the final component of the index is the sort field to maximize index efficiency.
Regardless of database engine, in order to retrieve this information the system will need to scan some number of documents or records to satisfy this query.
MongoDB supports regular expressions within queries. In Python, you can use the “re” module to construct the query:
import re
re_hacker = re.compile(r'.*hacker.*', re.IGNORECASE)
query = db.products.find({'type': 'Film', 'title': re_hacker})
query = query.sort([('details.issue_date', -1)])
MongoDB provides a special syntax for regular expression queries without the need for the re module. Consider the following alternative which is equivalent to the above example:
query = db.products.find({
'type': 'Film',
'title': {'$regex': '.*hacker.*', '$options':'i'}})
query = query.sort([('details.issue_date', -1)])
The $options operator specifies a case insensitive match.
The indexing strategy for these kinds of queries is different from previous attempts. Here, create an index on { type: 1, details.issue_date: -1, title: 1 } using the following command at the Python/PyMongo console:
db.products.ensure_index([
('type', 1),
('details.issue_date', -1),
('title', 1)])
This index makes it possible to avoid scanning whole documents by using the index for scanning the title rather than forcing MongoDB to scan whole documents for the title field. Additionally, to support the sort on the details.issue_date field, by placing this field before the title field, ensures that the result set is already ordered before MongoDB filters title field.
Database performance for these kinds of deployments are dependent on indexes. You may use sharding to enhance performance by allowing MongoDB to keep larger portions of those indexes in RAM. In sharded configurations, select a shard key that allows mongos to route queries directly to a single shard or small group of shards.
Since most of the queries in this system include the type field, include this in the shard key. Beyond this, the remainder of the shard key is difficult to predict without information about your database’s actual activity and distribution. Consider that:
In the following example, assume that the details.genre field is the second-most queried field after type. Enable sharding using the following shardCollection operation at the Python/PyMongo console:
>>> db.command('shardCollection', 'product', {
... key : { 'type': 1, 'details.genre' : 1, 'sku':1 } })
{ "collectionsharded" : "details.genre", "ok" : 1 }
Note
Even if you choose a “poor” shard key that requires mongos to broadcast all to all shards, you will still see some benefits from sharding, because:
While sharding is the best way to scale operations, some data sets make it impossible to partition data so that mongos can route queries to specific shards. In these situations mongos sends the query to all shards and then combines the results before returning to the client.
In these situations, you can add additional read performance by allowing mongos to read from the secondary instances in a replica set by configuring read preference in your client. Read preference is configurable on a per-connection or per-operation basis. In PyMongo, set the read_preference argument.
The SECONDARY property in the following example, permits reads from a secondary (as well as a primary) for the entire connection .
conn = pymongo.Connection(read_preference=pymongo.SECONDARY)
Conversely, the SECONDARY_ONLY read preference means that the client will only send read operation only to the secondary member
conn = pymongo.Connection(read_preference=pymongo.SECONDARY_ONLY)
You can also specify read_preference for specific queries, as follows:
results = db.product.find(..., read_preference=pymongo.SECONDARY)
or
results = db.product.find(..., read_preference=pymongo.SECONDARY_ONLY)
See also
This case study provides an overview of practices and patterns for designing and developing the inventory management portions of an E-commerce application.
See also
Customers in e-commerce stores regularly add and remove items from their “shopping cart,” change quantities multiple times, abandon the cart at any point, and sometimes have problems during and after checkout that require a hold or canceled order. These activities make it difficult to maintain inventory systems and counts and ensure that customers cannot “buy” items that are unavailable while they shop in your store.
This solution keeps the traditional metaphor of the shopping cart, but the shopping cart will age. After a shopping cart has been inactive for a certain period of time, all items in the cart re-enter the available inventory and the cart is empty. The state transition diagram for a shopping cart is below:
Inventory collections must maintain counts of the current available inventory of each stock-keeping unit (SKU; or item) as well as a list of items in carts that may return to the available inventory if they are in a shopping cart that times out. In the following example, the _id field stores the SKU:
{
_id: '00e8da9b',
qty: 16,
carted: [
{ qty: 1, cart_id: 42,
timestamp: ISODate("2012-03-09T20:55:36Z"), },
{ qty: 2, cart_id: 43,
timestamp: ISODate("2012-03-09T21:55:36Z"), },
]
}
Note
These examples use a simplified schema. In a production implementation, you may choose to merge this schema with the product catalog schema described in the “Product Catalog” document.
The SKU above has 16 items in stock, 1 item a cart, and 2 items in a second cart. This leaves a total of 19 unsold items of merchandise.
To model the shopping cart objects, you need to maintain sku, quantity, fields embedded in a shopping cart document:
{
_id: 42,
last_modified: ISODate("2012-03-09T20:55:36Z"),
status: 'active',
items: [
{ sku: '00e8da9b', qty: 1, item_details: {...} },
{ sku: '0ab42f88', qty: 4, item_details: {...} }
]
}
Note
The item_details field in each line item allows your application to display the cart contents to the user without requiring a second query to fetch details from the catalog collection.
This section introduces operations that you may use to support an e-commerce site. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
Moving an item from the available inventory to a cart is a fundamental requirement for a shopping cart system. The most important requirement is to ensure that your application will never move an unavailable item from the inventory to the cart.
Ensure that inventory is only updated if there is sufficient inventory to satisfy the request with the following add_item_to_cart function operation.
def add_item_to_cart(cart_id, sku, qty, details):
now = datetime.utcnow()
# Make sure the cart is still active and add the line item
result = db.cart.update(
{'_id': cart_id, 'status': 'active' },
{ '$set': { 'last_modified': now },
'$push': {
'items': {'sku': sku, 'qty':qty, 'details': details } } },
w=1)
if not result['updatedExisting']:
raise CartInactive()
# Update the inventory
result = db.inventory.update(
{'_id':sku, 'qty': {'$gte': qty}},
{'$inc': {'qty': -qty},
'$push': {
'carted': { 'qty': qty, 'cart_id':cart_id,
'timestamp': now } } },
w=1)
if not result['updatedExisting']:
# Roll back our cart update
db.cart.update(
{'_id': cart_id },
{ '$pull': { 'items': {'sku': sku } } })
raise InadequateInventory()
The system does not trust that the available inventory can satisfy a request
First this operation checks to make sure that the cart is “active” before adding a item. Then, it verifies that the available inventory to satisfy the request before decrementing inventory.
If there is not adequate inventory, the system removes the cart update: by specifying w=1 and checking the result allows the application to report an error if the cart is inactive or available quantity is insufficient to satisfy the request.
Note
This operation requires no indexes beyond the default index on the _id field.
The following process underlies adjusting the quantity of items in a users cart. The application must ensure that when a user increases the quantity of an item, in addition to updating the carted entry for the user’s cart, that the inventory exists to cover the modification.
def update_quantity(cart_id, sku, old_qty, new_qty):
now = datetime.utcnow()
delta_qty = new_qty - old_qty
# Make sure the cart is still active and add the line item
result = db.cart.update(
{'_id': cart_id, 'status': 'active', 'items.sku': sku },
{'$set': {
'last_modified': now,
'items.$.qty': new_qty },
},
w=1)
if not result['updatedExisting']:
raise CartInactive()
# Update the inventory
result = db.inventory.update(
{'_id':sku,
'carted.cart_id': cart_id,
'qty': {'$gte': delta_qty} },
{'$inc': {'qty': -delta_qty },
'$set': { 'carted.$.qty': new_qty, 'timestamp': now } },
w=1)
if not result['updatedExisting']:
# Roll back our cart update
db.cart.update(
{'_id': cart_id, 'items.sku': sku },
{'$set': { 'items.$.qty': old_qty } })
raise InadequateInventory()
Note
That the positional operator $ updates the particular carted entry and item that matched the query.
This allows the application to update the inventory and keep track of the data needed to “rollback” the cart in a single atomic operation. The code also ensures that the cart is active.
Note
This operation requires no indexes beyond the default index on the _id field.
The checkout operation must: validate the method of payment and remove the carted items after the transaction succeeds. Consider the following procedure:
def checkout(cart_id):
now = datetime.utcnow()
# Make sure the cart is still active and set to 'pending'. Also
# fetch the cart details so we can calculate the checkout price
cart = db.cart.find_and_modify(
{'_id': cart_id, 'status': 'active' },
update={'$set': { 'status': 'pending','last_modified': now } } )
if cart is None:
raise CartInactive()
# Validate payment details; collect payment
try:
collect_payment(cart)
db.cart.update(
{'_id': cart_id },
{'$set': { 'status': 'complete' } } )
db.inventory.update(
{'carted.cart_id': cart_id},
{'$pull': {'cart_id': cart_id} },
multi=True)
except:
db.cart.update(
{'_id': cart_id },
{'$set': { 'status': 'active' } } )
raise
Begin by “locking” the cart by setting its status to “pending” Then the system will verify that the cart is still active and collect payment data. Then, the findAndModify command makes it possible to update the cart atomically and return its details to capture payment information. Then:
Note
This operation requires no indexes beyond the default index on the _id field.
Periodically, your application must “expire” inactive carts and return their items to available inventory. In the example that follows the variable timeout controls the length of time before a cart expires:
def expire_carts(timeout):
now = datetime.utcnow()
threshold = now - timedelta(seconds=timeout)
# Lock and find all the expiring carts
db.cart.update(
{'status': 'active', 'last_modified': { '$lt': threshold } },
{'$set': { 'status': 'expiring' } },
multi=True )
# Actually expire each cart
for cart in db.cart.find({'status': 'expiring'}):
# Return all line items to inventory
for item in cart['items']:
db.inventory.update(
{ '_id': item['sku'],
'carted.cart_id': cart['id'],
'carted.qty': item['qty']
},
{'$inc': { 'qty': item['qty'] },
'$pull': { 'carted': { 'cart_id': cart['id'] } } })
db.cart.update(
{'_id': cart['id'] },
{'$set': { status': 'expired' })
This procedure:
To support returning inventory from timed-out cart, create an index to support queries on their status and last_modified fields. Use the following operations in the Python/PyMongo shell:
db.cart.ensure_index([('status', 1), ('last_modified', 1)])
The above operations do not account for one possible failure situation: if an exception occurs after updating the shopping cart but before updating the inventory collection. This would result in a shopping cart that may be absent or expired but items have not returned to available inventory.
To account for this case, your application will need a periodic cleanup operation that finds inventory items that have carted items and check that to ensure that they exist in a user’s cart, and return them to available inventory if they do not.
def cleanup_inventory(timeout):
now = datetime.utcnow()
threshold = now - timedelta(seconds=timeout)
# Find all the expiring carted items
for item in db.inventory.find(
{'carted.timestamp': {'$lt': threshold }}):
# Find all the carted items that matched
carted = dict(
(carted_item['cart_id'], carted_item)
for carted_item in item['carted']
if carted_item['timestamp'] < threshold)
# First Pass: Find any carts that are active and refresh the carted items
for cart in db.cart.find(
{ '_id': {'$in': carted.keys() },
'status':'active'}):
cart = carted[cart['_id']]
db.inventory.update(
{ '_id': item['_id'],
'carted.cart_id': cart['_id'] },
{ '$set': {'carted.$.timestamp': now } })
del carted[cart['_id']]
# Second Pass: All the carted items left in the dict need to now be
# returned to inventory
for cart_id, carted_item in carted.items():
db.inventory.update(
{ '_id': item['_id'],
'carted.cart_id': cart_id,
'carted.qty': carted_item['qty'] },
{ '$inc': { 'qty': carted_item['qty'] },
'$pull': { 'carted': { 'cart_id': cart_id } } })
To summarize: This operation finds all “carted” items that have time stamps older than the threshold. Then, the process makes two passes over these items:
Note
The function above is safe for use because it checks to ensure that the cart has expired before returning items from the cart to inventory. However, it could be long-running and slow other updates and queries.
Use judiciously.
If you need to shard the data for this system, the _id field is an ideal shard key for both carts and products because most update operations use the _id field. This allows mongos to route all updates that select on _id to a single mongod process.
There are two drawbacks for using _id as a shard key:
If the cart collection’s _id is an incrementing value, all new carts end up on a single shard.
You can mitigate this effect by choosing a random value upon the creation of a cart, such as a hash (i.e. MD5 or SHA-1) of an ObjectID, as the _id. The process for this operation would resemble the following:
import hashlib
import bson
cart_id = bson.ObjectId()
cart_id_hash = hashlib.md5(str(cart_id)).hexdigest()
cart = { "_id": cart_id, "cart_hash": cart_id_hash }
db.cart.insert(cart)
Cart expiration and inventory adjustment requires update operations and queries to broadcast to all shards when using _id as a shard key.
This may be less relevant as the expiration functions run relatively infrequently and you can queue them or artificially slow them down (as with judicious use of sleep()) to minimize server load.
Use the following commands in the Python/PyMongo console to shard the cart and inventory collections:
>>> db.command('shardCollection', 'inventory'
... 'key': { '_id': 1 } )
{ "collectionsharded" : "inventory", "ok" : 1 }
>>> db.command('shardCollection', 'cart')
... 'key': { '_id': 1 } )
{ "collectionsharded" : "cart", "ok" : 1 }
This document provides the basic design for modeling a product hierarchy stored in MongoDB as well as a collection of common operations for interacting with this data that will help you begin to write an E-commerce product category hierarchy.
See also
To model a product category hierarchy, this solution keeps each category in its own document that also has a list of its ancestors or “parents.” This document uses music genres as the basis of its examples:
Initial category hierarchy
Because these kinds of categories change infrequently, this model focuses on the operations needed to keep the hierarchy up-to-date rather than the performance profile of update operations.
This schema has the following properties:
Consider the following prototype:
{ "_id" : ObjectId("4f5ec858eb03303a11000002"),
"name" : "Modal Jazz",
"parent" : ObjectId("4f5ec858eb03303a11000001"),
"slug" : "modal-jazz",
"ancestors" : [
{ "_id" : ObjectId("4f5ec858eb03303a11000001"),
"slug" : "bop",
"name" : "Bop" },
{ "_id" : ObjectId("4f5ec858eb03303a11000000"),
"slug" : "ragtime",
"name" : "Ragtime" } ]
}
This section outlines the category hierarchy manipulations that you may need in an E-Commerce site. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
Use the following option to read and display a category hierarchy. This query will use the slug field to return the category information and a “bread crumb” trail from the current category to the top level category.
category = db.categories.find(
{'slug':slug},
{'_id':0, 'name':1, 'ancestors.slug':1, 'ancestors.name':1 })
Create a unique index on the slug field with the following operation on the Python/PyMongo console:
>>> db.categories.ensure_index('slug', unique=True)
To add a category you must first determine its ancestors. Take adding a new category “Swing” as a child of “Ragtime”, as below:
Adding a category
The insert operation would be trivial except for the ancestors. To define this array, consider the following helper function:
def build_ancestors(_id, parent_id):
parent = db.categories.find_one(
{'_id': parent_id},
{'name': 1, 'slug': 1, 'ancestors':1})
parent_ancestors = parent.pop('ancestors')
ancestors = [ parent ] + parent_ancestors
db.categories.update(
{'_id': _id},
{'$set': { 'ancestors': ancestors } })
You only need to travel “up” one level in the hierarchy to get the ancestor list for “Ragtime” that you can use to build the ancestor list for “Swing.” Then create a document with the following set of operations:
doc = dict(name='Swing', slug='swing', parent=ragtime_id)
swing_id = db.categories.insert(doc)
build_ancestors(swing_id, ragtime_id)
Note
Since these queries and updates all selected based on _id, you only need the default MongoDB-supplied index on _id to support this operation efficiently.
This section address the process for reorganizing the hierarchy by moving “bop” under “swing” as follows:
Change the parent of a category
Update the bop document to reflect the change in ancestry with the following operation:
db.categories.update(
{'_id':bop_id}, {'$set': { 'parent': swing_id } } )
The following helper function, rebuilds the ancestor fields to ensure correctness. [1]
def build_ancestors_full(_id, parent_id):
ancestors = []
while parent_id is not None:
parent = db.categories.find_one(
{'_id': parent_id},
{'parent': 1, 'name': 1, 'slug': 1, 'ancestors':1})
parent_id = parent.pop('parent')
ancestors.append(parent)
db.categories.update(
{'_id': _id},
{'$set': { 'ancestors': ancestors } })
You can use the following loop to reconstruct all the descendants of the “bop” category:
for cat in db.categories.find(
{'ancestors._id': bop_id},
{'parent_id': 1}):
build_ancestors_full(cat['_id'], cat['parent_id'])
| [1] | Your application cannot guarantee that the ancestor list of a parent category is correct, because MongoDB may process the categories out-of-order. |
Create an index on the ancestors._id field to support the update operation.
db.categories.ensure_index('ancestors._id')
To a rename a category you need to both update the category itself and also update all the descendants. Consider renaming “Bop” to “BeBop” as in the following figure:
Rename a category
First, you need to update the category name with the following operation:
db.categories.update(
{'_id':bop_id}, {'$set': { 'name': 'BeBop' } } )
Next, you need to update each descendant’s ancestors list:
db.categories.update(
{'ancestors._id': bop_id},
{'$set': { 'ancestors.$.name': 'BeBop' } },
multi=True)
This operation uses:
Note
In this case, the index you have already defined on ancestors._id is sufficient to ensure good performance.
For most deployments, sharding this collection has limited value because the collection will be very small. If you do need to shard, because most updates query the _id field, this field is a suitable shard key. Shard the collection with the following operation in the Python/PyMongo console.
>>> db.command('shardCollection', 'categories', {
... 'key': {'_id': 1} })
{ "collectionsharded" : "categories", "ok" : 1 }
The content management use cases introduce fundamental MongoDB practices and approaches, using familiar problems and simple examples. The “Metadata and Asset Management” document introduces a model that you may use when designing a web site content management system, while “Storing Comments” introduces the method for modeling user comments on content, like blog posts, and media, in MongoDB.
This document describes the design and pattern of a content management system using MongoDB modeled on the popular Drupal CMS.
You are designing a content management system (CMS) and you want to use MongoDB to store the content of your sites.
To build this system you will use MongoDB’s flexible schema to store all content “nodes” in a single collection regardless of type. This guide will provide prototype schema and describe common operations for the following primary node types:
This solution does not describe schema or process for storing or using navigational and organizational information.
Although documents in the nodes collection contain content of different times, all documents have a similar structure and a set of common fields. Consider the following prototype document for a “basic page” node type:
{
_id: ObjectId(…),
nonce: ObjectId(…),
metadata: {
type: 'basic-page'
section: 'my-photos',
slug: 'about',
title: 'About Us',
created: ISODate(...),
author: { _id: ObjectId(…), name: 'Rick' },
tags: [ ... ],
detail: { text: '# About Us\n…' }
}
}
Most fields are descriptively titled. The section field identifies groupings of items, as in a photo gallery, or a particular blog . The slug field holds a URL-friendly unique representation of the node, usually that is unique within its section for generating URLs.
All documents also have a detail field that varies with the document type. For the basic page above, the detail field might hold the text of the page. For a blog entry, the detail field might hold a sub-document. Consider the following prototype:
{
…
metadata: {
…
type: 'blog-entry',
section: 'my-blog',
slug: '2012-03-noticed-the-news',
…
detail: {
publish_on: ISODate(…),
text: 'I noticed the news from Washington today…'
}
}
}
Photos require a different approach. Because photos can be potentially larger than these documents, it’s important to separate the binary photo storage from the nodes metadata.
GridFS provides the ability to store larger files in MongoDB. GridFS stores data in two collections, in this case, cms.assets.files, which stores metadata, and cms.assets.chunks which stores the data itself. Consider the following prototype document from the cms.assets.files collection:
{
_id: ObjectId(…),
length: 123...,
chunkSize: 262144,
uploadDate: ISODate(…),
contentType: 'image/jpeg',
md5: 'ba49a...',
metadata: {
nonce: ObjectId(…),
slug: '2012-03-invisible-bicycle',
type: 'photo',
section: 'my-album',
title: 'Kitteh',
created: ISODate(…),
author: { _id: ObjectId(…), name: 'Jared' },
tags: [ … ],
detail: {
filename: 'kitteh_invisible_bike.jpg',
resolution: [ 1600, 1600 ], … }
}
}
Note
This document embeds the basic node document fields, which allows you to use the same code to manipulate nodes, regardless of type.
This section outlines a number of common operations for building and interacting with the metadata and asset layer of the cms for all node types. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
The most common operations inside of a CMS center on creating and editing content. Consider the following insert() operation:
db.cms.nodes.insert({
'nonce': ObjectId(),
'metadata': {
'section': 'myblog',
'slug': '2012-03-noticed-the-news',
'type': 'blog-entry',
'title': 'Noticed in the News',
'created': datetime.utcnow(),
'author': { 'id': user_id, 'name': 'Rick' },
'tags': [ 'news', 'musings' ],
'detail': {
'publish_on': datetime.utcnow(),
'text': 'I noticed the news from Washington today…' }
}
})
Once inserted, your application must have some way of preventing multiple concurrent updates. The schema uses the special nonce field to help detect concurrent edits. By using the nonce field in the query portion of the update operation, the application will generate an error if there is an editing collision. Consider the following update
def update_text(section, slug, nonce, text):
result = db.cms.nodes.update(
{ 'metadata.section': section,
'metadata.slug': slug,
'nonce': nonce },
{ '$set':{'metadata.detail.text': text, 'nonce': ObjectId() } },
w=1)
if not result['updatedExisting']:
raise ConflictError()
You may also want to perform metadata edits to the item such as adding tags:
db.cms.nodes.update(
{ 'metadata.section': section, 'metadata.slug': slug },
{ '$addToSet': { 'tags': { '$each': [ 'interesting', 'funny' ] } } })
In this example the $addToSet operator will only add values to the tags field if they do not already exist in the tags array, there’s no need to supply or update the nonce.
To support updates and queries on the metadata.section, and metadata.slug, fields and to ensure that two editors don’t create two documents with the same section name or slug. Use the following operation at the Python/PyMongo console:
>>> db.cms.nodes.ensure_index([
... ('metadata.section', 1), ('metadata.slug', 1)], unique=True)
The unique=True option prevents to documents from colliding. If you want an index to support queries on the above fields and the nonce field create the following index:
>>> db.cms.nodes.ensure_index([
... ('metadata.section', 1), ('metadata.slug', 1), ('nonce', 1) ])
However, in most cases, the first index will be sufficient to support these operations.
To update a photo object, use the following operation, which builds upon the basic update procedure:
def upload_new_photo(
input_file, section, slug, title, author, tags, details):
fs = GridFS(db, 'cms.assets')
with fs.new_file(
content_type='image/jpeg',
metadata=dict(
type='photo',
locked=datetime.utcnow(),
section=section,
slug=slug,
title=title,
created=datetime.utcnow(),
author=author,
tags=tags,
detail=detail)) as upload_file:
while True:
chunk = input_file.read(upload_file.chunk_size)
if not chunk: break
upload_file.write(chunk)
# unlock the file
db.assets.files.update(
{'_id': upload_file._id},
{'$set': { 'locked': None } } )
Because uploading the photo spans multiple documents and is a non-atomic operation, you must “lock” the file during upload by writing datetime.utcnow() in the record. This helps when there are multiple concurrent editors and lets the application detect stalled file uploads. This operation assumes that, for photo upload, the last update will succeed:
def update_photo_content(input_file, section, slug):
fs = GridFS(db, 'cms.assets')
# Delete the old version if it's unlocked or was locked more than 5
# minutes ago
file_obj = db.cms.assets.find_one(
{ 'metadata.section': section,
'metadata.slug': slug,
'metadata.locked': None })
if file_obj is None:
threshold = datetime.utcnow() - timedelta(seconds=300)
file_obj = db.cms.assets.find_one(
{ 'metadata.section': section,
'metadata.slug': slug,
'metadata.locked': { '$lt': threshold } })
if file_obj is None: raise FileDoesNotExist()
fs.delete(file_obj['_id'])
# update content, keep metadata unchanged
file_obj['locked'] = datetime.utcnow()
with fs.new_file(**file_obj):
while True:
chunk = input_file.read(upload_file.chunk_size)
if not chunk: break
upload_file.write(chunk)
# unlock the file
db.assets.files.update(
{'_id': upload_file._id},
{'$set': { 'locked': None } } )
As with the basic operations, you can use a much more simple operation to edit the tags:
db.cms.assets.files.update(
{ 'metadata.section': section, 'metadata.slug': slug },
{ '$addToSet': { 'metadata.tags': { '$each': [ 'interesting', 'funny' ] } } })
Create a unique index on { metadata.section: 1, metadata.slug: 1 } to support the above operations and prevent users from creating or updating the same file concurrently. Use the following operation in the Python/PyMongo console:
>>> db.cms.assets.files.ensure_index([
... ('metadata.section', 1), ('metadata.slug', 1)], unique=True)
To locate a node based on the value of metadata.section and metadata.slug, use the following find_one operation.
node = db.nodes.find_one({'metadata.section': section, 'metadata.slug': slug })
Note
The index defined (section, slug) created to support the update operation, is sufficient to support this operation as well.
To locate an image based on the value of metadata.section and metadata.slug, use the following find_one operation.
fs = GridFS(db, 'cms.assets')
with fs.get_version({'metadata.section': section, 'metadata.slug': slug }) as img_fpo:
# do something with the image file
Note
The index defined (section, slug) created to support the update operation, is sufficient to support this operation as well.
To retrieve a list of nodes based on their tags, use the following query:
nodes = db.nodes.find({'metadata.tags': tag })
Create an index on the tags field in the cms.nodes collection, to support this query:
>>> db.cms.nodes.ensure_index('tags')
To retrieve a list of images based on their tags, use the following operation:
image_file_objects = db.cms.assets.files.find({'metadata.tags': tag })
fs = GridFS(db, 'cms.assets')
for image_file_object in db.cms.assets.files.find(
{'metadata.tags': tag }):
image_file = fs.get(image_file_object['_id'])
# do something with the image file
Create an index on the tags field in the cms.assets.files collection, to support this query:
>>> db.cms.assets.files.ensure_index('tags')
Use the following operation to generate a list of recent blog posts sorted in descending order by date, for use on the index page of your site, or in an .rss or .atom feed.
articles = db.nodes.find({
'metadata.section': 'my-blog'
'metadata.published': { '$lt': datetime.utcnow() } })
articles = articles.sort({'metadata.published': -1})
Note
In many cases you will want to limit the number of nodes returned by this query.
Create a compound index on the { metadata.section: 1, metadata.published: 1 } fields to support this query and sort operation.
>>> db.cms.nodes.ensure_index(
... [ ('metadata.section', 1), ('metadata.published', -1) ])
Note
For all sort or range queries, ensure that field with the sort or range operation is the final field in the index.
In a CMS, read performance is more critical than write performance. To achieve the best read performance in a sharded cluster, ensure that the mongos can route queries to specific shards.
Also remember that MongoDB can not enforce unique indexes across shards. Using a compound shard key that consists of metadata.section and metadata.slug, will provide the same semantics as describe above.
Warning
Consider the actual use and workload of your cluster before configuring sharding for your cluster.
Use the following operation at the Python/PyMongo shell:
>>> db.command('shardCollection', 'cms.nodes', {
... key : { 'metadata.section': 1, 'metadata.slug' : 1 } })
{ "collectionsharded": "cms.nodes", "ok": 1}
>>> db.command('shardCollection', 'cms.assets.files', {
... key : { 'metadata.section': 1, 'metadata.slug' : 1 } })
{ "collectionsharded": "cms.assets.files", "ok": 1}
To shard the cms.assets.chunks collection, you must use the _id field as the shard key. The following operation will shard the collection
>>> db.command('shardCollection', 'cms.assets.chunks', {
... key : { 'files_id': 1 } })
{ "collectionsharded": "cms.assets.chunks", "ok": 1}
Sharding on the files_id field ensures routable queries because all reads from GridFS must first look up the document in cms.assets.files and then look up the chunks separately.
This document outlines the basic patterns for storing user-submitted comments in a content management system (CMS.)
MongoDB provides a number of different approaches for storing data like users-comments on content from a CMS. There is no correct implementation, but there are a number of common approaches and known considerations for each approach. This case study explores the implementation details and trade offs of each option. The three basic patterns are:
Store each comment in its own document.
This approach provides the greatest flexibility at the expense of some additional application level complexity.
These implementations make it possible to display comments in chronological or threaded order, and place no restrictions on the number of comments attached to a specific object.
Embed all comments in the “parent” document.
This approach provides the greatest possible performance for displaying comments at the expense of flexibility: the structure of the comments in the document controls the display format.
Note
Because of the limit on document size, documents, including the original content and all comments, cannot grow beyond 16 megabytes.
A hybrid design, stores comments separately from the “parent,” but aggregates comments into a small number of documents, where each contains many comments.
Also consider that comments can be threaded, where comments are always replies to “parent” item or to another comment, which carries certain architectural requirements discussed below.
If you store each comment in its own document, the documents in your comments collection, would have the following structure:
{
_id: ObjectId(...),
discussion_id: ObjectId(...),
slug: '34db',
posted: ISODateTime(...),
author: {
id: ObjectId(...),
name: 'Rick'
},
text: 'This is so bogus ... '
}
This form is only suitable for displaying comments in chronological order. Comments store:
To support threaded comments, you might use a slightly different structure like the following:
{
_id: ObjectId(...),
discussion_id: ObjectId(...),
parent_id: ObjectId(...),
slug: '34db/8bda'
full_slug: '2012.02.08.12.21.08:34db/2012.02.09.22.19.16:8bda',
posted: ISODateTime(...),
author: {
id: ObjectId(...),
name: 'Rick'
},
text: 'This is so bogus ... '
}
This structure:
Warning
MongoDB can only index 1024 bytes. This includes all field data, the field name, and the namespace (i.e. database name and collection name.) This may become an issue when you create an index of the full_slug field to support sorting.
This section contains an overview of common operations for interacting with comments represented using a schema where each comment is its own document.
All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Issue the following commands at the interactive Python shell to load the required libraries:
>>> import bson
>>> import pymongo
To post a new comment in a chronologically ordered (i.e. without threading) system, use the following insert() operation:
slug = generate_pseudorandom_slug()
db.comments.insert({
'discussion_id': discussion_id,
'slug': slug,
'posted': datetime.utcnow(),
'author': author_info,
'text': comment_text })
To insert a comment for a system with threaded comments, you must generate the slug path and full_slug at insert. See the following operation:
posted = datetime.utcnow()
# generate the unique portions of the slug and full_slug
slug_part = generate_pseudorandom_slug()
full_slug_part = posted.strftime('%Y.%m.%d.%H.%M.%S') + ':' + slug_part
# load the parent comment (if any)
if parent_slug:
parent = db.comments.find_one(
{'discussion_id': discussion_id, 'slug': parent_slug })
slug = parent['slug'] + '/' + slug_part
full_slug = parent['full_slug'] + '/' + full_slug_part
else:
slug = slug_part
full_slug = full_slug_part
# actually insert the comment
db.comments.insert({
'discussion_id': discussion_id,
'slug': slug,
'full_slug': full_slug,
'posted': posted,
'author': author_info,
'text': comment_text })
To view comments that are not threaded, select all comments participating in a discussion and sort by the posted field. For example:
cursor = db.comments.find({'discussion_id': discussion_id})
cursor = cursor.sort('posted')
cursor = cursor.skip(page_num * page_size)
cursor = cursor.limit(page_size)
Because the full_slug field contains both hierarchical information (via the path) and chronological information, you can use a simple sort on the full_slug field to retrieve a threaded view:
cursor = db.comments.find({'discussion_id': discussion_id})
cursor = cursor.sort('full_slug')
cursor = cursor.skip(page_num * page_size)
cursor = cursor.limit(page_size)
See also
To support the above queries efficiently, maintain two compound indexes, on:
Issue the following operation at the interactive Python shell.
>>> db.comments.ensure_index([
... ('discussion_id', 1), ('posted', 1)])
>>> db.comments.ensure_index([
... ('discussion_id', 1), ('full_slug', 1)])
Note
Ensure that you always sort by the final element in a compound index to maximize the performance of these queries.
To directly retrieve a comment, without needing to page through all comments, you can select by the slug field:
comment = db.comments.find_one({
'discussion_id': discussion_id,
'slug': comment_slug})
You can retrieve a “sub-discussion,” or a comment and all of its descendants recursively, by performing a regular expression prefix query on the full_slug field:
import re
subdiscussion = db.comments.find_one({
'discussion_id': discussion_id,
'full_slug': re.compile('^' + re.escape(parent_slug)) })
subdiscussion = subdiscussion.sort('full_slug')
Since you have already created indexes on { discussion_id: 1, full_slug: } to support retrieving sub-discussions, you can add support for the above queries by adding an index on { discussion_id: 1 , slug: 1 }. Use the following operation in the Python shell:
>>> db.comments.ensure_index([
... ('discussion_id', 1), ('slug', 1)])
This design embeds the entire discussion of a comment thread inside of the topic document. In this example, the “topic,” document holds the total content for whatever content you’re managing.
Consider the following prototype topic document:
{
_id: ObjectId(...),
... lots of topic data ...
comments: [
{ posted: ISODateTime(...),
author: { id: ObjectId(...), name: 'Rick' },
text: 'This is so bogus ... ' },
... ]
}
This structure is only suitable for a chronological display of all comments because it embeds comments in chronological order. Each document in the array in the comments contains the comment’s date, author, and text.
Note
Since you’re storing the comments in sorted order, there is no need to maintain per-comment slugs.
To support threading using this design, you would need to embed comments within comments, using a structure that resembles the following:
{
_id: ObjectId(...),
... lots of topic data ...
replies: [
{ posted: ISODateTime(...),
author: { id: ObjectId(...), name: 'Rick' },
text: 'This is so bogus ... ',
replies: [
{ author: { ... }, ... },
... ]
}
Here, the replies field in each comment holds the sub-comments, which can intern hold sub-comments.
Note
In the embedded document design, you give up some flexibility regarding display format, because it is difficult to display comments except as you store them in MongoDB.
If, in the future, you want to switch from chronological to threaded or from threaded to chronological, this design would make that migration quite expensive.
Warning
Remember that BSON documents have a 16 megabyte size limit. If popular discussions grow larger than 16 megabytes, additional document growth will fail.
Additionally, when MongoDB documents grow significantly after creation you will experience greater storage fragmentation and degraded update performance while MongoDB migrates documents internally.
This section contains an overview of common operations for interacting with comments represented using a schema that embeds all comments the document of the “parent” or topic content.
Note
For all operations below, there is no need for any new indexes since all the operations are function within documents. Because you would retrieve these documents by the _id field, you can rely on the index that MongoDB creates automatically.
To post a new comment in a chronologically ordered (i.e unthreaded) system, you need the following update():
db.discussion.update(
{ 'discussion_id': discussion_id },
{ '$push': { 'comments': {
'posted': datetime.utcnow(),
'author': author_info,
'text': comment_text } } } )
The $push operator inserts comments into the comments array in correct chronological order. For threaded discussions, the update() operation is more complex. To reply to a comment, the following code assumes that it can retrieve the ‘path’ as a list of positions, for the parent comment:
if path != []:
str_path = '.'.join('replies.%d' % part for part in path)
str_path += '.replies'
else:
str_path = 'replies'
db.discussion.update(
{ 'discussion_id': discussion_id },
{ '$push': {
str_path: {
'posted': datetime.utcnow(),
'author': author_info,
'text': comment_text } } } )
This constructs a field name of the form replies.0.replies.2... as str_path and then uses this value with the $push operator to insert the new comment into the parent comment’s replies array.
To view the comments in a non-threaded design, you must use the $slice operator:
discussion = db.discussion.find_one(
{'discussion_id': discussion_id},
{ ... some fields relevant to your page from the root discussion ...,
'comments': { '$slice': [ page_num * page_size, page_size ] }
})
To return paginated comments for the threaded design, you must retrieve the whole document and paginate the comments within the application:
discussion = db.discussion.find_one({'discussion_id': discussion_id})
def iter_comments(obj):
for reply in obj['replies']:
yield reply
for subreply in iter_comments(reply):
yield subreply
paginated_comments = itertools.slice(
iter_comments(discussion),
page_size * page_num,
page_size * (page_num + 1))
Instead of retrieving comments via slugs as above, the following example retrieves comments using their position in the comment list or tree.
For chronological (i.e. non-threaded) comments, just use the $slice operator to extract a comment, as follows:
discussion = db.discussion.find_one(
{'discussion_id': discussion_id},
{'comments': { '$slice': [ position, position ] } })
comment = discussion['comments'][0]
For threaded comments, you must find the correct path through the tree in your application, as follows:
discussion = db.discussion.find_one({'discussion_id': discussion_id})
current = discussion
for part in path:
current = current.replies[part]
comment = current
Note
Since parent comments embed child replies, this operation actually retrieves the entire sub-discussion for the comment you queried for.
See
In the “hybrid approach” you will store comments in “buckets” that hold about 100 comments. Consider the following example document:
{
_id: ObjectId(...),
discussion_id: ObjectId(...),
page: 1,
count: 42,
comments: [ {
slug: '34db',
posted: ISODateTime(...),
author: { id: ObjectId(...), name: 'Rick' },
text: 'This is so bogus ... ' },
... ]
}
Each document maintains page and count data that contains meta data regarding the page, the page number and the comment count, in addition to the comments array that holds the comments themselves.
Note
Using a hybrid format makes storing threaded comments complex, and this specific configuration is not covered in this document.
Also, 100 comments is a soft limit for the number of comments per page. This value is arbitrary: choose a value that will prevent the maximum document size from growing beyond the 16MB BSON document size limit, but large enough to ensure that most comment threads will fit in a single document. In some situations the number of comments per document can exceed 100, but this does not affect the correctness of the pattern.
This section contains a number of common operations that you may use when building a CMS using this hybrid storage model with documents that hold 100 comment “pages.”
All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
In order to post a new comment, you need to $push the comment onto the last page and $inc that page’s comment count. Consider the following example that queries on the basis of a discussion_id field:
page = db.comment_pages.find_and_modify(
{ 'discussion_id': discussion['_id'],
'page': discussion['num_pages'] },
{ '$inc': { 'count': 1 },
'$push': {
'comments': { 'slug': slug, ... } } },
fields={'count':1},
upsert=True,
new=True )
The find_and_modify() operation is an upsert,: if MongoDB cannot find a document with the correct page number, the find_and_modify() will create it and initialize the new document with appropriate values for count and comments.
To limit the number of comments per page to roughly 100, you will need to create new pages as they become necessary. Add the following logic to support this:
if page['count'] > 100:
db.discussion.update(
{ 'discussion_id: discussion['_id'],
'num_pages': discussion['num_pages'] },
{ '$inc': { 'num_pages': 1 } } )
This update() operation includes the last known number of pages in the query to prevent a race condition where the number of pages incriments twice, that would result in a nearly or totally empty document. If another process increments the number of pages, then update above does nothing.
To support the find_and_modify() and update() operations, maintain a compound index on (discussion_id, page) in the comment_pages collection, by issuing the following operation at the Python/PyMongo console:
>>> db.comment_pages.ensure_index([
... ('discussion_id', 1), ('page', 1)])
The following function defines how to paginate comments with a fixed page size (i.e. not with the roughly 100 comment documents in the above example,) as en example:
def find_comments(discussion_id, skip, limit):
result = []
page_query = db.comment_pages.find(
{ 'discussion_id': discussion_id },
{ 'count': 1, 'comments': { '$slice': [ skip, limit ] } })
page_query = page_query.sort('page')
for page in page_query:
result += page['comments']
skip = max(0, skip - page['count'])
limit -= len(page['comments'])
if limit == 0: break
return result
Here, the $slice operator pulls out comments from each page, but only when this satisfies the skip requirement. For example: if you have 3 pages with 100, 102, 101, and 22 comments on each page, and you wish to retrieve comments where skip=300 and limit=50. Use the following algorithm:
| Skip | Limit | Discussion |
|---|---|---|
| 300 | 50 | {$slice: [ 300, 50 ] } matches nothing in page #1; subtract page #1’s count from skip and continue. |
| 200 | 50 | {$slice: [ 200, 50 ] } matches nothing in page #2; subtract page #2’s count from skip and continue. |
| 98 | 50 | {$slice: [ 98, 50 ] } matches 2 comments in page #3; subtract page #3’s count from skip (saturating at 0), subtract 2 from limit, and continue. |
| 0 | 48 | {$slice: [ 0, 48 ] } matches all 22 comments in page #4; subtract 22 from limit and continue. |
| 0 | 26 | There are no more pages; terminate loop. |
Note
Since you already have an index on (discussion_id, page) in your comment_pages collection, MongoDB can satisfy these queries efficiently.
To retrieve a comment directly without paging through all preceding pages of commentary, use the slug to find the correct page, and then use application logic to find the correct comment:
page = db.comment_pages.find_one(
{ 'discussion_id': discussion_id,
'comments.slug': comment_slug},
{ 'comments': 1 })
for comment in page['comments']:
if comment['slug'] = comment_slug:
break
To perform this query efficiently you’ll need a new index on the discussion_id and comments.slug fields (i.e. { discussion_id: 1 comments.slug: 1 }.) Create this index using the following operation in the Python/PyMongo console:
>>> db.comment_pages.ensure_index([
... ('discussion_id', 1), ('comments.slug', 1)])
For all of the architectures discussed above, you will want to the discussion_id field to participate in the shard key, if you need to shard your application.
For applications that use the “one document per comment” approach, consider using slug (or full_slug, in the case of threaded comments) fields in the shard key to allow the mongos instances to route requests by slug. Issue the following operation at the Python/PyMongo console:
>>> db.command('shardCollection', 'comments', {
... 'key' : { 'discussion_id' : 1, 'full_slug': 1 } })
This will return the following response:
{ "collectionsharded" : "comments", "ok" : 1 }
In the case of comments that fully-embedded in parent content documents the determination of the shard key is outside of the scope of this document.
For hybrid documents, use the page number of the comment page in the shard key along with the discussion_id to allow MongoDB to split popular discussions between, while grouping discussions on the same shard. Issue the following operation at the Python/PyMongo console:
>>> db.command('shardCollection', 'comment_pages', {
... key : { 'discussion_id' : 1, 'page': 1 } })
{ "collectionsharded" : "comment_pages", "ok" : 1 }
In this tutorial, you will learn how to create a basic tumblelog application using the popular Django Python web-framework and the MongoDB database.
The tumblelog will consist of two parts:
This tutorial assumes that you are already familiar with Django and have a basic familiarity with MongoDB operation and have installed MongoDB.
Where to get help
If you’re having trouble going through this tutorial, please post a message to mongodb-user or join the IRC chat in #mongodb on irc.freenode.net to chat with other MongoDB users who might be able to help.
Note
Django MongoDB Engine uses a forked version of Django 1.3 that adds non-relational support.
Begin by installing packages required by later steps in this tutorial.
This tutorial uses pip to install packages and virtualenv to isolate Python environments. While these tools and this configuration are not required as such, they ensure a standard environment and are strongly recommended. Issue the following commands at the system prompt:
pip install virtualenv
virtualenv myproject
Respectively, these commands: install the virtualenv program (using pip) and create a isolated python environment for this project (named myproject.)
To activate myproject environment at the system prompt, use the following commands:
source myproject/bin/activate
Django MongoDB Engine directly depends on:
Install by issuing the following commands:
pip install https://bitbucket.org/wkornewald/django-nonrel/get/tip.tar.gz
pip install https://bitbucket.org/wkornewald/djangotoolbox/get/tip.tar.gz
pip install https://github.com/django-nonrel/mongodb-engine/tarball/master
Continue with the tutorial to begin building the “tumblelog” application.
In this tutorial you will build a basic blog as the foundation of this application and use this as the basis of your tumblelog application. You will add the first post using the shell and then later use the Django administrative interface.
Call the startproject command, as with other Django projects, to get started and create the basic project skeleton:
django-admin.py startproject tumblelog
Configure the database in the tumblelog/settings.py file:
DATABASES = {
'default': {
'ENGINE': 'django_mongodb_engine',
'NAME': 'my_tumble_log'
}
}
See also
The Django MongoDB Engine Settings documentation for more configuration options.
The first step in writing a tumblelog in Django is to define the “models” or in MongoDB’s terminology documents.
In this application, you will define posts and comments, so that each Post can contain a list of Comments. Edit the tumblelog/models.py file so it resembles the following:
from django.db import models
from django.core.urlresolvers import reverse
from djangotoolbox.fields import ListField, EmbeddedModelField
class Post(models.Model):
created_at = models.DateTimeField(auto_now_add=True, db_index=True)
title = models.CharField(max_length=255)
slug = models.SlugField()
body = models.TextField()
comments = ListField(EmbeddedModelField('Comment'), editable=False)
def get_absolute_url(self):
return reverse('post', kwargs={"slug": self.slug})
def __unicode__(self):
return self.title
class Meta:
ordering = ["-created_at"]
class Comment(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
body = models.TextField(verbose_name="Comment")
author = models.CharField(verbose_name="Name", max_length=255)
The Django “nonrel” code looks the same as vanilla Django, however there is no built in support for some of MongoDB’s native data types like Lists and Embedded data. djangotoolbox handles these definitions.
See
The Django MongoDB Engine fields documentation for more.
The models declare an index to the Post class. One for the created_at date as our frontpage will order by date: there is no need to add db_index on SlugField because there is a default index on SlugField.
The manage.py provides a shell interface for the application that you can use to insert data into the tumblelog. Begin by issuing the following command to load the Python shell:
python manage.py shell
Create the first post using the following sequence of operations:
>>> from tumblelog.models import *
>>> post = Post(
... title="Hello World!",
... slug="hello-world",
... body = "Welcome to my new shiny Tumble log powered by MongoDB and Django-MongoDB!"
... )
>>> post.save()
Add comments using the following sequence of operations:
>>> post.comments
[]
>>> comment = Comment(
... author="Joe Bloggs",
... body="Great post! I'm looking forward to reading your blog")
>>> post.comments.append(comment)
>>> post.save()
Finally, inspect the post:
>>> post = Post.objects.get()
>>> post
<Post: Hello World!>
>>> post.comments
[<Comment: Comment object>]
Because django-mongodb provides tight integration with Django you can use generic views to display the frontpage and post pages for the tumblelog. Insert the following content into the urls.py file to add the views:
from django.conf.urls.defaults import patterns, include, url
from django.views.generic import ListView, DetailView
from tumblelog.models import Post
urlpatterns = patterns('',
url(r'^$', ListView.as_view(
queryset=Post.objects.all(),
context_object_name="posts_list"),
name="home"
),
url(r'^post/(?P<slug>[a-zA-Z0-9-]+)/$', PostDetailView.as_view(
queryset=Post.objects.all(),
context_object_name="post"),
name="post"
),
)
In the tumblelog directory add the following directories templates and templates/tumblelog for storing the tumblelog templates:
mkdir -p templates/tumblelog
Configure Django so it can find the templates by updating TEMPLATE_DIRS in the settings.py file to the following:
import os.path
TEMPLATE_DIRS = (
os.path.join(os.path.realpath(__file__), '../templates'),
)
Then add a base template that all others can inherit from. Add the following to templates/base.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>My Tumblelog</title>
<link href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.css" rel="stylesheet">
<style>.content {padding-top: 80px;}</style>
</head>
<body>
<div class="topbar">
<div class="fill">
<div class="container">
<h1><a href="/" class="brand">My Tumblelog</a>! <small>Starring MongoDB and Django-MongoDB.</small></h1>
</div>
</div>
</div>
<div class="container">
<div class="content">
{% block page_header %}{% endblock %}
{% block content %}{% endblock %}
</div>
</div>
</body>
</html>
Create the frontpage for the blog, which should list all the posts. Add the following template to the templates/tumblelog/post_list.html:
{% extends "base.html" %}
{% block content %}
{% for post in posts_list %}
<h2><a href="{% url post slug=post.slug %}">{{ post.title }}</a></h2>
<p>{{ post.body|truncatewords:20 }}</p>
<p>
{{ post.created_at }} |
{% with total=post.comments|length %}
{{ total }} comment{{ total|pluralize }}
{% endwith %}
</p>
{% endfor %}
{% endblock %}
Finally, add templates/tumblelog/post_detail.html for the individual posts:
{% extends "base.html" %}
{% block page_header %}
<div class="page-header">
<h1>{{ post.title }}</h1>
</div>
{% endblock %}
{% block content %}
<p>{{ post.body }}<p>
<p>{{ post.created_at }}</p>
<hr>
<h2>Comments</h2>
{% if post.comments %}
{% for comment in post.comments %}
<p>{{ comment.body }}</p>
<p><strong>{{ comment.author }}</strong> <small>on {{ comment.created_at }}</small></p>
{{ comment.text }}
{% endfor %}
{% endif %}
{% endblock %}
Run python manage.py runserver to see your new tumblelog! Go to http://localhost:8000/ and you should see:
In the next step you will provide the facility for readers of the tumblelog to comment on posts. This a requires custom form and view to handle the form, and data. You will also update the template to include the form.
You must customize form handling to deal with embedded comments. By extending ModelForm, it is possible to append the comment to the post on save. Create and add the following to forms.py:
from django.forms import ModelForm
from tumblelog.models import Comment
class CommentForm(ModelForm):
def __init__(self, object, *args, **kwargs):
"""Override the default to store the original document
that comments are embedded in.
"""
self.object = object
return super(CommentForm, self).__init__(*args, **kwargs)
def save(self, *args):
"""Append to the comments list and save the post"""
self.object.comments.append(self.instance)
self.object.save()
return self.object
class Meta:
model = Comment
You must extend the generic views need to handle the form logic. Add the following to the views.py file:
from django.http import HttpResponseRedirect
from django.views.generic import DetailView
from tumblelog.forms import CommentForm
class PostDetailView(DetailView):
methods = ['get', 'post']
def get(self, request, *args, **kwargs):
self.object = self.get_object()
form = CommentForm(object=self.object)
context = self.get_context_data(object=self.object, form=form)
return self.render_to_response(context)
def post(self, request, *args, **kwargs):
self.object = self.get_object()
form = CommentForm(object=self.object, data=request.POST)
if form.is_valid():
form.save()
return HttpResponseRedirect(self.object.get_absolute_url())
context = self.get_context_data(object=self.object, form=form)
return self.render_to_response(context)
Note
The PostDetailView class extends the DetailView class so that it can handle GET and POST requests. On POST, post() validates the comment: if valid, post() appends the comment to the post.
Don’t forget to update the urls.py file and import the PostDetailView class to replace the DetailView class.
Finally, you can add the form to the templates, so that readers can create comments. Splitting the template for the forms out into templates/_forms.html will allow maximum reuse of forms code:
<fieldset>
{% for field in form.visible_fields %}
<div class="clearfix {% if field.errors %}error{% endif %}">
{{ field.label_tag }}
<div class="input">
{{ field }}
{% if field.errors or field.help_text %}
<span class="help-inline">
{% if field.errors %}
{{ field.errors|join:' ' }}
{% else %}
{{ field.help_text }}
{% endif %}
</span>
{% endif %}
</div>
</div>
{% endfor %}
{% csrf_token %}
<div style="display:none">{% for h in form.hidden_fields %} {{ h }}{% endfor %}</div>
</fieldset>
After the comments section in post_detail.html add the following code to generate the comments form:
<h2>Add a comment</h2>
<form action="." method="post">
{% include "_forms.html" %}
<div class="actions">
<input type="submit" class="btn primary" value="comment">
</div>
</form>
Your tumblelog’s readers can now comment on your posts! Run python manage.py runserver to see the changes. Run python manage.py runserver and go to http://localhost:8000/hello-world/ to see the following:
While you may always add posts using the shell interface as above, you can easily create an administrative interface for posts with Django. Enable the admin by adding the following apps to INSTALLED_APPS in settings.py.
Warning
This application does not require the Sites framework. As a result, remove django.contrib.sites from INSTALLED_APPS. If you need it later please read SITE_ID issues document.
Create a admin.py file and register the Post model with the admin app:
from django.contrib import admin
from tumblelog.models import Post
admin.site.register(Post)
Note
The above modifications deviate from the default django-nonrel and djangotoolbox mode of operation. Django’s administration module will not work unless you exclude the comments field. By making the comments field non-editable in the “admin” model definition, you will allow the administrative interface to function.
If you need an administrative interface for a ListField you must write your own Form / Widget.
See
The Django Admin documentation docs for additional information.
Update the urls.py to enable the administrative interface. Add the import and discovery mechanism to the top of the file and then add the admin import rule to the urlpatterns:
# Enable admin
from django.contrib import admin
admin.autodiscover()
urlpatterns = patterns('',
# ...
url(r'^admin/', include(admin.site.urls)),
)
Finally, add a superuser and setup the indexes by issuing the following command at the system prompt:
python manage.py syncdb
Once done run the server and you can login to admin by going to http://localhost:8000/admin/.
Currently, the application only supports posts. In this section you will add special post types including: Video, Image and Quote to provide a more traditional tumblelog application. Adding this data requires no migration.
In models.py update the Post class to add new fields for the new post types. Mark these fields with blank=True so that the fields can be empty.
Update Post in the models.py files to resemble the following:
POST_CHOICES = (
('p', 'post'),
('v', 'video'),
('i', 'image'),
('q', 'quote'),
)
class Post(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
title = models.CharField(max_length=255)
slug = models.SlugField()
comments = ListField(EmbeddedModelField('Comment'), editable=False)
post_type = models.CharField(max_length=1, choices=POST_CHOICES, default='p')
body = models.TextField(blank=True, help_text="The body of the Post / Quote")
embed_code = models.TextField(blank=True, help_text="The embed code for video")
image_url = models.URLField(blank=True, help_text="Image src")
author = models.CharField(blank=True, max_length=255, help_text="Author name")
def get_absolute_url(self):
return reverse('post', kwargs={"slug": self.slug})
def __unicode__(self):
return self.title
Note
Django-Nonrel doesn’t support multi-table inheritance. This means that you will have to manually create an administrative form to handle data validation for the different post types.
The “Abstract Inheritance” facility means that the view logic would need to merge data from multiple collections.
The administrative interface should now handle adding multiple types of post. To conclude this process, you must update the frontend display to handle and output the different post types.
In the post_list.html file, change the post output display to resemble the following:
{% if post.post_type == 'p' %}
<p>{{ post.body|truncatewords:20 }}</p>
{% endif %}
{% if post.post_type == 'v' %}
{{ post.embed_code|safe }}
{% endif %}
{% if post.post_type == 'i' %}
<p><img src="{{ post.image_url }}" /><p>
{% endif %}
{% if post.post_type == 'q' %}
<blockquote>{{ post.body|truncatewords:20 }}</blockquote>
<p>{{ post.author }}</p>
{% endif %}
In the post_detail.html file, change the output for full posts:
{% if post.post_type == 'p' %}
<p>{{ post.body }}<p>
{% endif %}
{% if post.post_type == 'v' %}
{{ post.embed_code|safe }}
{% endif %}
{% if post.post_type == 'i' %}
<p><img src="{{ post.image_url }}" /><p>
{% endif %}
{% if post.post_type == 'q' %}
<blockquote>{{ post.body }}</blockquote>
<p>{{ post.author }}</p>
{% endif %}
Now you have a fully fledged tumbleblog using Django and MongoDB!
This tutorial describes the process for creating a basic tumblelog application using the popular Flask Python web-framework in conjunction with the MongoDB database.
The tumblelog will consist of two parts:
This tutorial assumes that you are already familiar with Flask and have a basic familiarity with MongoDB and have installed MongoDB. This tutorial uses MongoEngine as the Object Document Mapper (ODM,) this component may simplify the interaction between Flask and MongoDB.
Where to get help
If you’re having trouble going through this tutorial, please post a message to mongodb-user or join the IRC chat in #mongodb on irc.freenode.net to chat with other MongoDB users who might be able to help.
Begin by installing packages required by later steps in this tutorial.
This tutorial uses pip to install packages and virtualenv to isolate Python environments. While these tools and this configuration are not required as such, they ensure a standard environment and are strongly recommended. Issue the following command at the system prompt:
pip install virtualenv
virtualenv myproject
Respectively, these commands: install the virtualenv program (using pip) and create a isolated python environment for this project (named myproject.)
To activate myproject environment at the system prompt, use the following command:
source myproject/bin/activate
Flask is a “microframework,” because it provides a small core of functionality and is highly extensible. For the “tumblelog” project, this tutorial includes task and the following extension:
Install with the following commands:
pip install flask
pip install flask-script
pip install WTForms
pip install mongoengine
pip install flask_mongoengine
Continue with the tutorial to begin building the “tumblelog” application.
First, create a simple “bare bones” application. Make a directory named tumblelog for the project and then, add the following content into a file named __init__.py:
from flask import Flask
app = Flask(__name__)
if __name__ == '__main__':
app.run()
Next, create the manage.py file. [1] Use this file to load additional Flask-scripts in the future. Flask-scripts provides a development server and shell:
# Set the path
import os, sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from flask.ext.script import Manager, Server
from tumblelog import app
manager = Manager(app)
# Turn on debugger by default and reloader
manager.add_command("runserver", Server(
use_debugger = True,
use_reloader = True,
host = '0.0.0.0')
)
if __name__ == "__main__":
manager.run()
You can run this application with a test server, by issuing the following command at the system prompt:
python manage.py runserver
There should be no errors, and you can visit http://localhost:5000/ in a web browser to view a page with a “404” message.
| [1] | This concept will be familiar to users of Django. |
Install the Flask extension and add the configuration. Update tumblelog/__init__.py so that it resembles the following:
from flask import Flask
from flask.ext.mongoengine import MongoEngine
app = Flask(__name__)
app.config["MONGODB_DB"] = "my_tumble_log"
app.config["SECRET_KEY"] = "KeepThisS3cr3t"
db = MongoEngine(app)
if __name__ == '__main__':
app.run()
See also
The MongoEngine Settings documentation for additional configuration options.
The first step in writing a tumblelog in Flask is to define the “models” or in MongoDB’s terminology documents.
In this application, you will define posts and comments, so that each Post can contain a list of Comments. Edit the models.py file so that it resembles the following:
import datetime
from flask import url_for
from tumblelog import db
class Post(db.Document):
created_at = db.DateTimeField(default=datetime.datetime.now, required=True)
title = db.StringField(max_length=255, required=True)
slug = db.StringField(max_length=255, required=True)
body = db.StringField(required=True)
comments = db.ListField(db.EmbeddedDocumentField('Comment'))
def get_absolute_url(self):
return url_for('post', kwargs={"slug": self.slug})
def __unicode__(self):
return self.title
meta = {
'allow_inheritance': True,
'indexes': ['-created_at', 'slug'],
'ordering': ['-created_at']
}
class Comment(db.EmbeddedDocument):
created_at = db.DateTimeField(default=datetime.datetime.now, required=True)
body = db.StringField(verbose_name="Comment", required=True)
author = db.StringField(verbose_name="Name", max_length=255, required=True)
As above, MongoEngine syntax is simple and declarative. If you have a Django background, the syntax may look familiar. This example defines indexes for Post: one for the created_at date as our frontpage will order by date and another for the individual post slug.
The manage.py provides a shell interface for the application that you can use to insert data into the tumblelog. Before configuring the “urls” and “views” for this application, you can use this interface to interact with your the tumblelog. Begin by issuing the following command to load the Python shell:
python manage.py shell
Create the first post using the following sequence of operations:
>>> from tumblelog.models import *
>>> post = Post(
... title="Hello World!",
... slug="hello-world",
... body="Welcome to my new shiny Tumble log powered by MongoDB, MongoEngine, and Flask"
... )
>>> post.save()
Add comments using the following sequence of operations:
>>> post.comments
[]
>>> comment = Comment(
... author="Joe Bloggs",
... body="Great post! I'm looking forward to reading your blog!"
... )
>>> post.comments.append(comment)
>>> post.save()
Finally, inspect the post:
>>> post = Post.objects.get()
>>> post
<Post: Hello World!>
>>> post.comments
[<Comment: Comment object>]
Using Flask’s class-based views system allows you to produce List and Detail views for tumblelog posts. Add views.py and create a posts blueprint:
from flask import Blueprint, request, redirect, render_template, url_for
from flask.views import MethodView
from tumblelog.models import Post, Comment
posts = Blueprint('posts', __name__, template_folder='templates')
class ListView(MethodView):
def get(self):
posts = Post.objects.all()
return render_template('posts/list.html', posts=posts)
class DetailView(MethodView):
def get(self, slug):
post = Post.objects.get_or_404(slug=slug)
return render_template('posts/detail.html', post=post)
# Register the urls
posts.add_url_rule('/', view_func=ListView.as_view('list'))
posts.add_url_rule('/<slug>/', view_func=DetailView.as_view('detail'))
Now in __init__.py register the blueprint, avoiding a circular dependency by registering the blueprints in a method. Add the following code to the module:
def register_blueprints(app):
# Prevents circular imports
from tumblelog.views import posts
app.register_blueprint(posts)
register_blueprints(app)
Add this method and method call to the main body of the module and not in the main block.
In the tumblelog directory add the templates and templates/posts directories to store the tumblelog templates:
mkdir -p templates/posts
Create a base template. All other templates will inherit from this template, which should exist in the templates/base.html file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>My Tumblelog</title>
<link href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.css" rel="stylesheet">
<style>.content {padding-top: 80px;}</style>
</head>
<body>
{%- block topbar -%}
<div class="topbar">
<div class="fill">
<div class="container">
<h2>
<a href="/" class="brand">My Tumblelog</a> <small>Starring Flask, MongoDB and MongoEngine</small>
</h2>
</div>
</div>
</div>
{%- endblock -%}
<div class="container">
<div class="content">
{% block page_header %}{% endblock %}
{% block content %}{% endblock %}
</div>
</div>
{% block js_footer %}{% endblock %}
</body>
</html>
Continue by creating a landing page for the blog that will list all posts. Add the following to the templates/posts/list.html file:
{% extends "base.html" %}
{% block content %}
{% for post in posts %}
<h2><a href="{{ url_for('posts.detail', slug=post.slug) }}">{{ post.title }}</a></h2>
<p>{{ post.body|truncate(100) }}</p>
<p>
{{ post.created_at.strftime('%H:%M %Y-%m-%d') }} |
{% with total=post.comments|length %}
{{ total }} comment {%- if total > 1 %}s{%- endif -%}
{% endwith %}
</p>
{% endfor %}
{% endblock %}
Finally, add templates/posts/detail.html template for the individual posts:
{% extends "base.html" %}
{% block page_header %}
<div class="page-header">
<h1>{{ post.title }}</h1>
</div>
{% endblock %}
{% block content %}
<p>{{ post.body }}<p>
<p>{{ post.created_at.strftime('%H:%M %Y-%m-%d') }}</p>
<hr>
<h2>Comments</h2>
{% if post.comments %}
{% for comment in post.comments %}
<p>{{ comment.body }}</p>
<p><strong>{{ comment.author }}</strong> <small>on {{ comment.created_at.strftime('%H:%M %Y-%m-%d') }}</small></p>
{{ comment.text }}
{% endfor %}
{% endif %}
{% endblock %}
At this point, you can run the python manage.py runserver command again to see your new tumblelog! Go to http://localhost:5000 to see something that resembles the following:
In the next step you will provide the facility for readers of the tumblelog to comment on posts. To provide commenting, you will create a form using WTForms that will update the view to handle the form data and update the template to include the form.
Begin by updating and refactoring the views.py file so that it can handle the form. Begin by adding the import statement and the DetailView class to this file:
from flask.ext.mongoengine.wtf import model_form
...
class DetailView(MethodView):
form = model_form(Comment, exclude=['created_at'])
def get_context(self, slug):
post = Post.objects.get_or_404(slug=slug)
form = self.form(request.form)
context = {
"post": post,
"form": form
}
return context
def get(self, slug):
context = self.get_context(slug)
return render_template('posts/detail.html', **context)
def post(self, slug):
context = self.get_context(slug)
form = context.get('form')
if form.validate():
comment = Comment()
form.populate_obj(comment)
post = context.get('post')
post.comments.append(comment)
post.save()
return redirect(url_for('posts.detail', slug=slug))
return render_template('posts/detail.html', **context)
Note
DetailView extends the default Flask MethodView. This code remains DRY by defining a get_context method to get the default context for both GET and POST requests. On POST, post() validates the comment: if valid, post() appends the comment to the post.
Finally, you can add the form to the templates, so that readers can create comments. Create a macro for the forms in templates/_forms.html will allow you to reuse the form code:
{% macro render(form) -%}
<fieldset>
{% for field in form %}
{% if field.type in ['CSRFTokenField', 'HiddenField'] %}
{{ field() }}
{% else %}
<div class="clearfix {% if field.errors %}error{% endif %}">
{{ field.label }}
<div class="input">
{% if field.name == "body" %}
{{ field(rows=10, cols=40) }}
{% else %}
{{ field() }}
{% endif %}
{% if field.errors or field.help_text %}
<span class="help-inline">
{% if field.errors %}
{{ field.errors|join(' ') }}
{% else %}
{{ field.help_text }}
{% endif %}
</span>
{% endif %}
</div>
</div>
{% endif %}
{% endfor %}
</fieldset>
{% endmacro %}
Add the comments form to templates/posts/detail.html. Insert an import statement at the top of the page and then output the form after displaying comments:
{% import "_forms.html" as forms %}
...
<hr>
<h2>Add a comment</h2>
<form action="." method="post">
{{ forms.render(form) }}
<div class="actions">
<input type="submit" class="btn primary" value="comment">
</div>
</form>
Your tumblelog’s readers can now comment on your posts! Run python manage.py runserver to see the changes.
While you may always add posts using the shell interface as above, in this step you will add an administrative interface for the tumblelog site. To add the administrative interface you will add authentication and an additional view. This tutorial only addresses adding and editing posts: a “delete” view and detection of slug collisions are beyond the scope of this tutorial.
For the purposes of this tutorial all we need is a very basic form of authentication. The following example borrows from the an example Flask “Auth snippet”. Create the file auth.py with the following content:
from functools import wraps
from flask import request, Response
def check_auth(username, password):
"""This function is called to check if a username /
password combination is valid.
"""
return username == 'admin' and password == 'secret'
def authenticate():
"""Sends a 401 response that enables basic auth"""
return Response(
'Could not verify your access level for that URL.\n'
'You have to login with proper credentials', 401,
{'WWW-Authenticate': 'Basic realm="Login Required"'})
def requires_auth(f):
@wraps(f)
def decorated(*args, **kwargs):
auth = request.authorization
if not auth or not check_auth(auth.username, auth.password):
return authenticate()
return f(*args, **kwargs)
return decorated
Note
This creates a requires_auth decorator: provides basic authentication. Decorate any view that needs authentication with this decorator. The username is admin and password is secret.
Create the views and admin blueprint in admin.py. The following view is deliberately generic, to facilitate customization.
from flask import Blueprint, request, redirect, render_template, url_for
from flask.views import MethodView
from flask.ext.mongoengine.wtf import model_form
from tumblelog.auth import requires_auth
from tumblelog.models import Post, Comment
admin = Blueprint('admin', __name__, template_folder='templates')
class List(MethodView):
decorators = [requires_auth]
cls = Post
def get(self):
posts = self.cls.objects.all()
return render_template('admin/list.html', posts=posts)
class Detail(MethodView):
decorators = [requires_auth]
def get_context(self, slug=None):
form_cls = model_form(Post, exclude=('created_at', 'comments'))
if slug:
post = Post.objects.get_or_404(slug=slug)
if request.method == 'POST':
form = form_cls(request.form, inital=post._data)
else:
form = form_cls(obj=post)
else:
post = Post()
form = form_cls(request.form)
context = {
"post": post,
"form": form,
"create": slug is None
}
return context
def get(self, slug):
context = self.get_context(slug)
return render_template('admin/detail.html', **context)
def post(self, slug):
context = self.get_context(slug)
form = context.get('form')
if form.validate():
post = context.get('post')
form.populate_obj(post)
post.save()
return redirect(url_for('admin.index'))
return render_template('admin/detail.html', **context)
# Register the urls
admin.add_url_rule('/admin/', view_func=List.as_view('index'))
admin.add_url_rule('/admin/create/', defaults={'slug': None}, view_func=Detail.as_view('create'))
admin.add_url_rule('/admin/<slug>/', view_func=Detail.as_view('edit'))
Note
Here, the List and Detail views are similar to the frontend of the site; however, requires_auth decorates both views.
The “Detail” view is slightly more complex: to set the context, this view checks for a slug and if there is no slug, Detail uses the view for creating a new post. If a slug exists, Detail uses the view for editing an existing post.
In the __init__.py file update the register_blueprints() method to import the new admin blueprint.
def register_blueprints(app):
# Prevents circular imports
from tumblelog.views import posts
from tumblelog.admin import admin
app.register_blueprint(posts)
app.register_blueprint(admin)
Similar to the user-facing portion of the site, the administrative section of the application requires three templates: a base template a list view, and a detail view.
Create an admin directory for the templates. Add a simple main index page for the admin in the templates/admin/base.html file:
{% extends "base.html" %}
{%- block topbar -%}
<div class="topbar" data-dropdown="dropdown">
<div class="fill">
<div class="container">
<h2>
<a href="{{ url_for('admin.index') }}" class="brand">My Tumblelog Admin</a>
</h2>
<ul class="nav secondary-nav">
<li class="menu">
<a href="{{ url_for("admin.create") }}" class="btn primary">Create new post</a>
</li>
</ul>
</div>
</div>
</div>
{%- endblock -%}
List all the posts in the templates/admin/list.html file:
{% extends "admin/base.html" %}
{% block content %}
<table class="condensed-table zebra-striped">
<thead>
<th>Title</th>
<th>Created</th>
<th>Actions</th>
</thead>
<tbody>
{% for post in posts %}
<tr>
<th><a href="{{ url_for('admin.edit', slug=post.slug) }}">{{ post.title }}</a></th>
<td>{{ post.created_at.strftime('%Y-%m-%d') }}</td>
<td><a href="{{ url_for("admin.edit", slug=post.slug) }}" class="btn primary">Edit</a></td>
</tr>
{% endfor %}
</tbody>
</table>
{% endblock %}
Add a temple to create and edit posts in the templates/admin/detail.html file:
{% extends "admin/base.html" %}
{% import "_forms.html" as forms %}
{% block content %}
<h2>
{% if create %}
Add new Post
{% else %}
Edit Post
{% endif %}
</h2>
<form action="?{{ request.query_string }}" method="post">
{{ forms.render(form) }}
<div class="actions">
<input type="submit" class="btn primary" value="save">
<a href="{{ url_for("admin.index") }}" class="btn secondary">Cancel</a>
</div>
</form>
{% endblock %}
The administrative interface is ready for use. Restart the test server (i.e. runserver) so that you can log in to the administrative interface located at http://localhost:5000/admin/. (The username is admin and the password is secret.)
Currently, the application only supports posts. In this section you will add special post types including: Video, Image and Quote to provide a more traditional tumblelog application. Adding this data requires no migration because MongoEngine supports document inheritance.
Begin by refactoring the Post class to operate as a base class and create new classes for the new post types. Update the models.py file to include the code to replace the old Post class:
class Post(db.DynamicDocument):
created_at = db.DateTimeField(default=datetime.datetime.now, required=True)
title = db.StringField(max_length=255, required=True)
slug = db.StringField(max_length=255, required=True)
comments = db.ListField(db.EmbeddedDocumentField('Comment'))
def get_absolute_url(self):
return url_for('post', kwargs={"slug": self.slug})
def __unicode__(self):
return self.title
@property
def post_type(self):
return self.__class__.__name__
meta = {
'allow_inheritance': True,
'indexes': ['-created_at', 'slug'],
'ordering': ['-created_at']
}
class BlogPost(Post):
body = db.StringField(required=True)
class Video(Post):
embed_code = db.StringField(required=True)
class Image(Post):
image_url = db.StringField(required=True, max_length=255)
class Quote(Post):
body = db.StringField(required=True)
author = db.StringField(verbose_name="Author Name", required=True, max_length=255)
Note
In the Post class the post_type helper returns the class name, which will make it possible to render the various different post types in the templates.
As MongoEngine handles returning the correct classes when fetching Post objects you do not need to modify the interface view logic: only modify the templates.
Update the templates/posts/list.html file and change the post output format as follows:
{% if post.body %}
{% if post.post_type == 'Quote' %}
<blockquote>{{ post.body|truncate(100) }}</blockquote>
<p>{{ post.author }}</p>
{% else %}
<p>{{ post.body|truncate(100) }}</p>
{% endif %}
{% endif %}
{% if post.embed_code %}
{{ post.embed_code|safe() }}
{% endif %}
{% if post.image_url %}
<p><img src="{{ post.image_url }}" /><p>
{% endif %}
In the templates/posts/detail.html change the output for full posts as follows:
{% if post.body %}
{% if post.post_type == 'Quote' %}
<blockquote>{{ post.body }}</blockquote>
<p>{{ post.author }}</p>
{% else %}
<p>{{ post.body }}</p>
{% endif %}
{% endif %}
{% if post.embed_code %}
{{ post.embed_code|safe() }}
{% endif %}
{% if post.image_url %}
<p><img src="{{ post.image_url }}" /><p>
{% endif %}
In this section you will update the administrative interface to support the new post types.
Begin by, updating the admin.py file to import the new document models and then update get_context() in the Detail class to dynamically create the correct model form to use:
from tumblelog.models import Post, BlogPost, Video, Image, Quote, Comment
# ...
class Detail(MethodView):
decorators = [requires_auth]
# Map post types to models
class_map = {
'post': BlogPost,
'video': Video,
'image': Image,
'quote': Quote,
}
def get_context(self, slug=None):
if slug:
post = Post.objects.get_or_404(slug=slug)
# Handle old posts types as well
cls = post.__class__ if post.__class__ != Post else BlogPost
form_cls = model_form(cls, exclude=('created_at', 'comments'))
if request.method == 'POST':
form = form_cls(request.form, inital=post._data)
else:
form = form_cls(obj=post)
else:
# Determine which post type we need
cls = self.class_map.get(request.args.get('type', 'post'))
post = cls()
form_cls = model_form(cls, exclude=('created_at', 'comments'))
form = form_cls(request.form)
context = {
"post": post,
"form": form,
"create": slug is None
}
return context
# ...
Update the template/admin/base.html file to create a new post drop down menu in the toolbar:
{% extends "base.html" %}
{%- block topbar -%}
<div class="topbar" data-dropdown="dropdown">
<div class="fill">
<div class="container">
<h2>
<a href="{{ url_for('admin.index') }}" class="brand">My Tumblelog Admin</a>
</h2>
<ul class="nav secondary-nav">
<li class="menu">
<a href="#" class="menu">Create new</a>
<ul class="menu-dropdown">
{% for type in ('post', 'video', 'image', 'quote') %}
<li><a href="{{ url_for("admin.create", type=type) }}">{{ type|title }}</a></li>
{% endfor %}
</ul>
</li>
</ul>
</div>
</div>
</div>
{%- endblock -%}
{% block js_footer %}
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script src="http://twitter.github.com/bootstrap/1.4.0/bootstrap-dropdown.js"></script>
{% endblock %}
Now you have a fully fledged tumbleblog using Flask and MongoEngine!
The complete source code is available on Github: <https://github.com/rozza/flask-tumblelog>
This document answers basic questions about MongoDB.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
Frequently Asked Questions:
MongoDB is document-oriented DBMS. Think of MySQL but with JSON-like objects comprising the data model, rather than RDBMS tables. Significantly, MongoDB supports neither joins nor transactions. However, it features secondary indexes, an expressive query language, atomic writes on a per-document level, and fully-consistent reads.
Operationally, MongoDB features master-slave replication with automated failover and built-in horizontal scaling via automated range-based partitioning.
Instead of tables, a MongoDB database stores its data in collections, which are the rough equivalent of RDMS tables. A collection holds one or more documents, which corresponds to a record or a row in a relational database table, and each document has one or more fields, which corresponds to a column in a relational database table.
Collections are have some important differences from RDMS tables:
MongoDB uses dynamic schemas. You can create collections without defining the structure, i.e. the fields or the types of their values, of the documents in the collection. You can change the structure of documents simply by adding new fields or deleting existing ones. Documents in a collection need not have an identical set of fields.
In practice, it is common for a the documents in a collection to have a largely homogeneous structure; however, this is not a requirement. MongoDB’s flexible schemas mean that schema migration and augmentation are very easy in practice, and you will rarely, if ever, need to write scripts that perform “alter table” type operations, which simplifies and facilitates iterative software development with MongoDB.
MongoDB client drivers exist for all of the most popular programming languages, and many of the less popular ones. See the latest list of drivers for details.
See also
“Drivers.”
No.
However, MongoDB does support a rich, ad-hoc query language of its own.
See also
The query “Operator Reference” document and the Query Overview and the Tour pages from the wiki.
MongoDB has a general-purpose design, making it appropriate for a large number of use cases. Examples include content management systems, mobile app, gaming, e-commerce, analytics, archiving, and logging.
Do not use MongoDB for systems that require SQL, joins, and multi-object transactions.
MongoDB does not provide ACID transactions.
However, MongoDB does provide some basic transactional capabilities. Atomic operations are possible within the scope of a single document: that is, we can debit a and credit b as a transaction if they are fields within the same document. Because documents can be rich, some documents contain thousands of fields, with support for testing fields in sub-documents.
Additionally, you can make writes in MongoDB durable (the ‘D’ in ACID). To get durable writes, you must enable journaling, which is on by default in 64-bit builds. You must also issue writes with a write concern of {j: true} to ensure that the writes block until the journal has synced to disk.
Users have built successful e-commerce systems using MongoDB, but application requiring multi-object commit with rollback generally aren’t feasible.
Not necessarily. It’s certainly possible to run MongoDB on a machine with a small amount of free RAM.
MongoDB automatically uses all free memory on the machine as its cache. System resource monitors show that MongoDB uses a lot of memory, but it’s usage is dynamic. If another process suddenly needs half the server’s RAM, MongoDB will yield cached memory to the other process.
Technically, the operating system’s virtual memory subsystem manages MongoDB’s memory. This means that MongoDB will use as much free memory as it can, swapping to disk as needed. Deployments with enough memory to fit the application’s working data set in RAM will achieve the best performance.
MongoDB has no configurable cache. MongoDB uses all free memory on the system automatically by way of memory-mapped files. Operating systems use the same approach with their file system caches.
Writes are physically written to the journal within 100 milliseconds. At that point, the write is “durable” in the sense that after a pull-plug-from-wall event, the data will still be recoverable after a hard restart.
While the journal commit is nearly instant, MongoDB writes to the data files lazily. MongoDB may wait to write data to the data files for as much as one minute. This does not affect durability, as the journal has enough information to ensure crash recovery.
No. In MongoDB, a document’s representation in the database is similar to its representation in application memory. This means the database already stores the usable form of data, making the data usable in both the persistent store and in the application cache. This eliminates the need for a separate caching layer in the application.
This differs from relational databases, where caching data is more expensive. Relational databases must transform data into object representations that applications can read and must store the transformed data in a separate cache: if these transformation from data to application objects require joins, this process increases the overhead related to using the database which increases the importance of the caching layer.
Yes. MongoDB keeps all of the most recently used data in RAM. If you have created indexes for your queries and your working data set fits in RAM, MongoDB serves all queries from memory.
MongoDB does not implement a query cache: MongoDB serves all queries directly from the indexes and/or data files.
MongoDB is implemented in C++. Drivers and client libraries are typically written in their respective languages, although some drivers use C extensions for better performance.
MongoDB uses memory-mapped files. When running a 32-bit build of MongoDB, the total storage size for the server, including data and indexes, is 2 gigabytes. For this reason, do not deploy MongoDB to production on 32-bit machines.
If you’re running a 64-bit build of MongoDB, there’s virtually no limit to storage size. For production deployments, 64-bit builds and operating systems are strongly recommended.
See also
Note
32-bit builds disable journaling by default because journaling further limits the maximum amount of data that the database can store.
This document answers common questions about application development using MongoDB.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
Frequently Asked Questions:
A “namespace” is the concatenation of the database name and the collection names with a period character in between.
Collections are containers for documents that share one or more indexes. Databases are groups of collections stored on disk using a single set of data files.
For an example acme.users namespace, acme is the database name and users is the collection name. Period characters can occur in collection names, so that the acme.user.history is a valid namespace, with the acme database name, and the user.history collection name.
While data models like this appear to support nested collections, the collection namespace is flat, and there is no difference from the perspective of MongoDB between acme, acme.users, and acme.records.
In the mongo shell, you can use the following operation to duplicate the entire collection:
db.people.find().forEach( function(x){db.user.insert(x)} );
Note
Because this process decodes BSON documents to JSON during the copy procedure, documents you may incur a loss of type-fidelity.
Consider using mongodump and mongorestore to maintain type fidelity.
Also consider the cloneCollection command that may provide some of this functionality.
Yes.
When you use db.collection.remove(), the object will no longer exist in MongoDB’s on-disk data storage.
MongoDB flushes writes to disk on a regular interval. In the default configuration, MongoDB writes data to the main data files on disk every 60 seconds and commits the journal every 100 milliseconds. These values are configurable with the journalCommitInterval and syncdelay.
These values represent the maximum amount of time between the completion of a write operation and the point when the write is durable in the journal, if enabled, and when MongoDB flushes data to the disk. In many cases MongoDB and the operating system flush data to disk more frequently, so that the above values resents a theoretical maximum.
However, by default, MongoDB uses a “lazy” strategy to write to disk. This is advantageous in situations where the database receives a thousand increments to an object within one second, MongoDB only needs to flush this data to disk once. In addition to the aforementioned configuration options, you can also use fsync and getLastError to modify this strategy.
MongoDB does not have support for traditional locking or complex transactions with rollback. MongoDB aims to be lightweight, fast, and predictable in its performance. This is similar to the MySQL MyISAM autocommit model. By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems with a number of database server processes.
MongoDB does have support for atomic operations within a single document. Given the possibilities provided by nested documents, this feature provides support for a large number of use-cases.
See also
The Atomic Operations wiki page.
In version 2.1 and later, you can use the new “aggregation framework,” with the aggregate command.
MongoDB also supports map-reduce with the mapReduce, as well as basic aggregation with the group, count, and distinct. commands.
See also
The Aggregation wiki page.
If you see a very large number connection and re-connection messages in your MongoDB log, then clients are frequently connecting and disconnecting to the MongoDB server. This is normal behavior for applications that do not use request pooling, such as CGI. Consider using FastCGI, an Apache Module, or some other kind of persistent application server to decrease the connection overhead.
If these connections do not impact your performance you can use the run-time quiet option or the command-line option --quiet to suppress these messages from the log.
Yes.
MongoDB users of all sizes have had a great deal of success using MongoDB on the EC2 platform using EBS disks.
See also
The “MongoDB on the Amazon Platform” wiki page.
MongoDB aggressively preallocates data files to reserve space and avoid file system fragmentation. You can use the smallfiles flag to modify the file preallocation strategy.
See also
This wiki page that address MongoDB disk use.
Each MongoDB document contains a certain amount of overhead. This overhead is normally insignificant but becomes significant if all documents are just a few bytes, as might be the case if the documents in your collection only have one or two fields.
Consider the following suggestions and strategies for optimizing storage utilization for these collections:
Use the _id field explicitly.
MongoDB clients automatically add an _id field to each document and generate a unique 12-byte ObjectId for the _id field. Furthermore, MongoDB always indexes the _id field. For smaller documents this may account for a significant amount of space.
To optimize storage use, users can specify a value for the _id field explicitly when inserting documents into the collection. This strategy allows applications to store a value in the _id field that would have occupied space in another portion of the document.
You can store any value in the _id field, but because this value serves as a primary key for documents in the collection, it must uniquely identify them. If the field’s value is not unique, then it cannot serve as a primary key as there would be collisions in collection.
Use shorter field names.
MongoDB stores all field names in every document. For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space. Consider a collection of documents that resemble the following:
{ last_name : "Smith", best_score: 3.9 }
If you shorten the filed named last_name to lname and the field name best_score to score, as follows, you could save 9 bytes per document.
{ lname : "Smith", score : 3.9 }
Shortening field names reduces expressiveness and does not provide considerable benefit on for larger documents and where document overhead is not significant concern. Shorter field names do not reduce the size of indexes, because indexes have a predefined structure.
In general it is not necessary to use short field names.
Embed documents.
In some cases you may want to embed documents in other documents and save on the per-document overhead.
For documents in a MongoDB collection, you should always use GridFS for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.
Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.
For more information on GridFS, see GridFS.
As a client program assembles a query in MongoDB, it builds a BSON object, not a string. Thus traditional SQL injection attacks are not a problem. More details and some nuances are covered below.
MongoDB represents queries as BSON objects. Typically client libraries provide a convenient, injection free, process to build these objects. Consider the following C++ example:
BSONObj my_query = BSON( "name" << a_name );
auto_ptr<DBClientCursor> cursor = c.query("tutorial.persons", my_query);
Here, my_query then will have a value such as { name : "Joe" }. If my_query contained special characters, for example ,, :, and {, the query simply wouldn’t match any documents. For example, users cannot hijack a query and convert it to a delete.
All of the following MongoDB operations permit you to run arbitrary JavaScript expressions directly on the server:- $where:
You must exercise care in these cases to prevent users from submitting malicious JavaScript.
Fortunately, you can express most queries in MongoDB without JavaScript and for queries that require JavaScript, you can mix JavaScript and non-JavaScript in a single query. Place all the user-supplied fields directly in a BSON field and pass JavaScript code to the $where field.
If you need to pass user-supplied values in a $where clause, you may escape these values with the CodeWScope mechanism. When you set user-submitted values as variables in the scope document, you can avoid evaluating them on the database server.
If you need to use db.eval() with user supplied values, you can either use a CodeWScope or you can supply extra arguments to your function. For instance:
db.eval(function(userVal){...},
user_value);
This will ensure that your application sends user_value to the database server as data rather than code.
Field names in MongoDB’s query language have a semantic. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your application’s users cannot inject operators into their inputs.
In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FFOE (i.e. “.”).
Consider the following example:
BSONObj my_object = BSON( a_key << a_name );
The user may have supplied a $ value in the a_key value. At the same time, my_object might be { $where : "things" }. Consider the following cases:
Insert. Inserting this into the database does no harm. The insert process does not evaluate the object as a query.
Note
MongoDB client drivers, if properly implemented, check for reserved characters in keys on inserts.
Update. The db.collection.update() operation permits $ operators in the update argument but does not support the $where operator. Still, some users may be able to inject operators that can manipulate a single document only. Therefore your application should escape keys, as mentioned above, if reserved characters are possible.
Query Generally this is not a problem for queries that resemble { x : user_obj }: dollar signs are not top level and have no effect. Theoretically it may be possible for the user to build a query themselves. But checking the user-submitted content for $ characters in key names may help protect against this kind of injection.
See the “PHP MongoDB Driver Security Notes” page in the PHP driver documentation for more information
MongoDB implements a readers-writer lock. This means that at any one time, only one client may be writing or any number of clients may be reading, but that reading and writing cannot occur simultaneously.
In standalone and replica sets the lock’s scope applies to a single mongod instance or primary instance. In a sharded cluster, locks apply to each individual shard, not to the whole cluster.
For more information, see FAQ: Concurrency.
MongoDB permits documents within a single collection to have fields with different BSON types. For instance, the following documents may exist within a single collection.
{ x: "string" }
{ x: 42 }
When comparing values of different BSON types, MongoDB uses the following compare order:
Note
MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.
Consider the following mongo example:
db.test.insert({x:3});
db.test.insert( {x : 2.9} );
db.test.insert( {x : new Date()} );
db.test.insert( {x : true } );
db.test.find().sort({x:1});
{ "_id" : ObjectId("4b03155dce8de6586fb002c7"), "x" : 2.9 }
{ "_id" : ObjectId("4b03154cce8de6586fb002c6"), "x" : 3 }
{ "_id" : ObjectId("4b031566ce8de6586fb002c9"), "x" : true }
{ "_id" : ObjectId("4b031563ce8de6586fb002c8"), "x" : "Tue Nov 17 2009 16:28:03 GMT-0500 (EST)" }
The $type operator provides access to BSON type comparison in the MongoDB query syntax. See the documentation on BSON types and the $type operator for additional information.
Warning
Storing values of the different types in the same field in a collection is strongly discouraged.
See also
Fields in a document may store null values, as in a notional collection, test, with the following documents:
{ _id: 1, cancelDate: null }
{ _id: 2 }
Different query operators treat null values differently:
The { cancelDate : null } query matches documents that either contains the cancelDate field whose value is null or that do not contain the cancelDate field:
db.test.find( { cancelDate: null } )
The query returns both documents:
{ "_id" : 1, "cancelDate" : null }
{ "_id" : 2 }
The { cancelDate : { $type: 10 } } query matches documents that contains the cancelDate field whose value is null only; i.e. the value of the cancelDate field is of BSON Type Null (i.e. 10) :
db.test.find( { cancelDate : { $type: 10 } } )
The query returns only the document that contains the null value:
{ "_id" : 1, "cancelDate" : null }
The { cancelDate : { $exists: false } } query matches documents that do not contain the cancelDate field:
db.test.find( { cancelDate : { $exists: false } } )
The query returns only the document that does not contain the cancelDate field:
{ "_id" : 2 }
Collection names can be any UTF-8 string with the following exceptions:
If your collection name includes special characters, such as the underscore character, then to access the collection use the db.getCollection() method or a similar method for your driver.
Example
To create a collection _foo and insert the { a : 1 } document, use the following operation:
db.getCollection("_foo").insert( { a : 1 } )
To perform a query, use the find() find method, in as the following:
db.getCollection("_foo").find()
MongoDB cursors can return the same document more than once in some situations. [1] You can use the snapshot() method on a cursor to isolate the operation for a very specific case.
snapshot() traverses the index on the _id field and guarantees that the query will return each document (with respect to the value of the _id field) no more than once. [2]
The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations..
Warning
As an alternative, if your collection has a field or fields that are never modified, you can use a unique index on this field or these fields to achieve a similar result as the snapshot(). Query with hint() to explicitly force the query to use that index.
| [1] | As a cursor returns documents other operations may interleave with the query: if some of these operations are updates that cause the document to move (in the case of a table scan, caused by document growth,) or that change the indexed field on the index used by the query; then the cursor will return the same document more than once. |
| [2] | MongoDB does not permit changes to the value of the _id field; it is not possible for a cursor that transverses this index to pass the same document more than once. |
When modeling data in MongoDB, embedding is frequently the choice for:
You should also consider embedding for performance reasons if you have a collection with a large number of small documents. Nevertheless, if small, separate documents represent the natural model for the data, then you should maintain that model.
If, however, you can group these small documents by some logical relationship and you frequently retrieve the documents by this grouping, you might consider “rolling-up” the small documents into larger documents that contain an array of subdocuments. Keep in mind that if you often only need to retrieve a subset of the documents within the group, then “rolling-up” the documents may not provide better performance.
“Rolling up” these small documents into logical groupings means that queries to retrieve a group of documents involve sequential reads and fewer random disk accesses.
Additionally, “rolling up” documents and moving common fields to the larger document benefit the index on these fields. There would be fewer copies of the common fields and there would be fewer associated key entries in the corresponding index. See Indexing Overview for more information on indexes.
Changed in version 2.2.
MongoDB allows multiple clients to read and write a single corpus of data using a locking system to ensure that all clients receive a consistent view of the data and to prevent multiple applications from modifying the exact same pieces of data at the same time. Locks help guarantee that all writes to a single document occur either in full or not at all.
Frequently Asked Questions:
MongoDB uses a readers-writer [1] lock that allows concurrent reads access to a database but gives exclusive access to a single write operation.
When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock.
Locks are “writer greedy,” which means writes have preference over reads. When both a read and write are waiting for a lock, MongoDB grants the lock to the write.
| [1] | You may be familiar with a “readers-writer” lock as “multi-reader” or “shared exclusive” lock. See the Wikipedia page on Readers-Writer Locks for more information. |
Changed in version 2.2.
Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance.
For example, if you have six databases and one takes a write lock, the other five are still available for read and write.
For reporting on lock utilization information on locks, use any of the following methods:
Specifically, the locks document in the output of serverStatus, or the locks field in the current operation reporting provides insight into the type of locks and amount of lock contention in your mongod instance.
To terminate an operation, use db.killOp().
New in version 2.0.
A read and write operations will yield their locks if the mongod receives a page fault or fetches data that is unlikely to be in memory. Yielding allows other operations that only need to access documents that are already in memory to complete while mongod loads documents into memory.
Additionally, write operations that affect multiple documents (i.e. update() <db.collection.update() with the multi parameter,) will yield periodically to allow read operations during these log write operations. Similarly, long running read locks will yield periodically to ensure that write operations have the opportunity to complete.
Changed in version 2.2: The use of yielding expanded greatly in MongoDB 2.2. Including the “yield for page fault.” MongoDB tracks the contents of memory and predicts whether data is available before performing a read. If MongoDB predicts that the data is not in memory a read operation yields its lock while MongoDB loads the data to memory. Once data is available in memory, the read will reacquire the lock to completes the operation.
Changed in version 2.2.
The following table lists common database operations and the types of locks they use.
| Operation | Lock Type |
|---|---|
| Issue a query | Read lock |
| Get more data from a cursor | Read lock |
| Insert data | Write lock |
| Remove data | Write lock |
| Update data | Write lock |
| Map-reduce | Read lock and write lock, unless operations are specified as non-atomic. Portions of map-reduce jobs can run concurrently. |
| Create an index | Building an index in the foreground, which is the default, locks the database for extended periods of time. |
| db.eval() | Write lock. db.eval() blocks all other JavaScript processes. |
| eval | Write lock. If used with the nolock lock option, the eval option does not take a write lock and cannot write data to the database. |
| aggregate() | Read lock |
Certain administrative commands can exclusively lock the database for extended periods of time. In some deployments, for large databases, you may consider taking the the mongod instance offline so that clients are not affected. For example, if a mongod is part of a replica set, take the mongod offline and let other members of the set service load while maintenance is in progress.
The following administrative operations require an exclusive (i.e. write) lock to a the database for extended periods:
The db.collection.group() operation takes a read lock and does not allow any other threads to execute JavaScript while it is running.
The following administrative commands lock the database but only hold the lock for a very short time:
The following MongoDB operations lock multiple databases:
Sharding improves concurrency by distributing collections over multiple mongod instances, allowing shard servers (i.e. mongos processes) to perform any number of operations concurrently to the various downstream mongod instances.
Each mongod instance is independent of the others in the shard cluster and uses the MongoDB readers-writer lock). The operations on one mongod instance do not block the operations on any others.
In replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must lock both databases at the same time keep both data consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.
In replication, MongoDB does not apply writes serially to secondaries. Secondaries collect oplog entries in batches and then apply those batches in parallel. Secondaries do not allow reads while applying the write operations, and apply write operations in the order that they appear in the oplog.
MongoDB can apply several writes in parallel on replica set secondaries, in a two phases:
A single mongod can only single JavaScript operation at once. Therefore, operations that rely on JavaScript cannot run concurrently; however, the mongod can often run other database operations concurrently with the JavaScript execution. This limitation with JavaScript affects the following operations:
The JavaScript operations within a mapReduce job are short lived and yield many times during the operation. Portions of the map-reduce operation take database locks for reading, writing data to a temporary collection and writing the final output of the write operation.
The group takes a read lock in addition to blocking all other JavaScript execution.
Unless you specify the nolock option, db.eval() takes a write lock in addition to blocking all JavaScript operations.
Only a single query that uses the $where operation can run at a time.
This document answers common questions about horizontal scaling using MongoDB’s sharding.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
Frequently Asked Questions:
Sometimes.
If your data set fits on a single servers, you should begin with an unsharded deployment.
Converting an unsharded database to a sharded cluster is easy and seamless, so there is little advantage in configuring sharding while your data set is small.
Still, all production deployments should use replica sets to provide high availability and disaster recovery.
To use replication with sharding, deploy each shard as a replica set.
No.
There is no automatic support in MongoDB for changing a shard key after sharding a collection. This reality underscores the important of choosing a good shard key. If you must change a shard key after sharding a collection, the best option is to:
See shardCollection, sh.shardCollection(), Sharded Cluster Administration, the Shard Key section in the Sharding Internals document, Deploy a Sharded Cluster, and SERVER-4000 for more information.
In the current implementation, all databases in a sharded cluster have a “primary shard.” All unsharded collection within that database will reside on the same shard.
Sharding must be specifically enabled on a collection. After enabling sharding on the collection, MongoDB will assign various ranges of collection data to the different shards in the cluster. The cluster automatically corrects imbalances between shards by migrating ranges of data from one shard to another.
The mongos routes the operation to the “old” shard, where it will succeed immediately. Then the shard mongod instances will replicate the modification to the “new” shard before the sharded cluster updates that chunk’s “ownership,” which effectively finalizes the migration process.
If a shard is inaccessible or unavailable, queries will return with an error.
However, a client may set the partial query bit, which will then return results from all available shards, regardless of whether a given shard is unavailable.
If a shard is responding slowly, mongos will merely wait for the shard to return results.
Changed in version 2.0.
The exact method for distributing queries to shards in a cluster depends on the nature of the query and the configuration of the sharded cluster. Consider a sharded collection, using the shard key user_id, that has last_login and email attributes:
For a query that selects one or more values for the user_id key:
mongos determines which shard or shards contains the relevant data, based on the cluster metadata, and directs a query to the required shard or shards, and returns those results to the client.
For a query that selects user_id and also performs a sort:
mongos can make a straightforward translation of this operation into a number of queries against the relevant shards, ordered by user_id. When the sorted queries return from all shards, the mongos merges the sorted results and returns the complete result to the client.
For queries that select on last_login:
These queries must run on all shards: mongos must parallelize the query over the shards and perform a merge-sort on the email of the documents found.
If you call the cursor.sort() method on a query in a sharded environment, the mongod for each shard will sort its results, and the mongos merges each shard’s results before returning them to the client.
If you do not use _id as the shard key, then your application/client layer must be responsible for keeping the _id field unique. It is problematic for collections to have duplicate _id values.
If you’re not sharding your collection by the _id field, then you should be sure to store a globally unique identifier in that field. The default BSON ObjectID works well in this case.
First, ensure that you’ve declared a shard key for your collection. Until you have configured the shard key, MongoDB will not create chunks, and sharding will not occur.
Next, keep in mind that the default chunk size is 64 MB. As a result, in most situations, the collection needs at least 64 MB before a migration will occur.
Additionally, the system which balances chunks among the servers attempts to avoid superfluous migrations. Depending on the number of shards, your shard key, and the amount of data, systems often require at least 10 chunks of data to trigger migrations.
You can run db.printShardingStatus() to see all the chunks present in your cluster.
Yes. mongod creates these files as backups during normal shard balancing operations.
Once these migrations are complete, you may delete these files.
You can set noMoveParanoia to true to disable this behavior.
Typically, each client maintains as connection to mongos. mongos maintains a connection pool to support a single outgoing connection to each shard for. For incoming connections that direct read operations to secondaries, the mongos will also need to maintain connects to each member of the replica set that provides the shard.
mongos uses a set of connection pools to communicate with each shard. These pools do not shrink when the number of clients decreases.
This can lead to an unused mongos with a large number open of connections. If the mongos is no longer in use, you’re safe restarting the process to close existing connections.
Connect to the mongos with the mongo shell, and run the following command:
db._adminCommand("connPoolStats");
The writeback listener is a process that opens a long poll to relay writes back from a mongod or mongos after migrations to make sure they have not gone to the wrong server. The writeback listener sends writes back to the correct server if necessary.
These messages are a key part of the sharding infrastructure and should not cause concern.
Failed migrations require no administrative intervention. Chunk moves are consistent and deterministic.
If a migration fails to complete for some reason, the cluster will retry the operation. When the migration completes successfully, the data will reside only on the new shard.
See
The wiki page that describes this process: “Changing Configuration Servers.”
mongos instances maintain a cache of the config database that holds the metadata for the sharded cluster. This metadata includes the mapping of chunks to shards.
mongos updates its cache lazily by issuing a request to a shard and discovering that its metadata is out of date. There is no way to control this behavior from the client, but you can run the flushRouterConfig command against any mongos to force it to refresh its cache.
The mongos instances will detect these changes without intervention over time. However, if you want to force the mongos to reload its configuration, run the flushRouterConfig command against to each mongos directly.
The maxConns option limits the number of connections accepted by mongos.
If your client driver or application creates a large number of connections but allows them to time out rather than closing them explicitly, then it might make sense to limit the number of connections at the mongos layer.
Set maxConns to a value slightly higher than the maximum number of connections that the client creates, or the maximum size of the connection pool. This setting prevents the mongos from causing connection spikes on the individual shards. Spikes like these may disrupt the operation and memory allocation of the sharded cluster.
If the query does not include the shard key, the mongos must send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use either the shard key index or another more efficient index to fulfill the query.
If the query includes multiple sub-expressions that reference the fields indexed by the shard key and the secondary index, the mongos can route the queries to a specific shard and the shard will use the index that will allow it to fulfill most efficiently. See this document for more information.
Shard keys can be random. Random keys ensure optimal distribution of data across the cluster.
Sharded clusters, attempt to route queries to specific shards when queries include the shard key as a parameter, because these directed queries are more efficient. In many cases, random keys can make it difficult to direct queries to specific shards.
Yes. There is no requirement that documents be evenly distributed by the shard key.
However, documents that have the shard key must reside in the same chunk and therefore on the same server. If your sharded data set has too many documents with the exact same shard key you will not be able to distribute those documents across your sharded cluster.
You can use any field for the shard key. The _id field is a common shard key.
Be aware that ObjectId() values, which are the default value of the _id field, increment as a timestamp. As a result, when used as a shard key, all new documents inserted into the collection will initially belong to the same chunk on a single shard. Although the system will eventually divide this chunk and migrate its contents to distribute data more evenly, at any moment the cluster can only direct insert operations at a single shard. This can limit the throughput of inserts. If most of your write operations are updates or read operations rather than inserts, this limitation should not impact your performance. However, if you have a high insert volume, this may be a limitation.
If you insert documents with monotonically increasing shard keys, all inserts will initially belong to the same chunk on a single shard. Although the system will eventually divide this chunk and migrate its contents to distribute data more evenly, at any moment the cluster can only direct insert operations at a single shard. This can limit the throughput of inserts.
If most of your write operations are updates or read operations rather than inserts, this limitation should not impact your performance. However, if you have a high insert volume, a monotonically increasing shard key may be a limitation.
To address this issue, you can use a field with a value that stores the hash of a key with an ascending value. While you can compute a hashed value in your application and include this value in your documents for use as a shard key, the SERVER-2001 issue will implement this capability within MongoDB.
Consider the following error message:
ERROR: moveChunk commit failed: version is at <n>|<nn> instead of <N>|<NN>" and "ERROR: TERMINATING"
mongod issues this message if, during a chunk migration, the shard could not connect to the config database to update chunk information at the end of the migration process. If the shard cannot update the config database after moveChunk, the cluster will have an inconsistent view of all chunks. In these situations, the primary member of the shard will terminate itself to prevent data inconsistency. If the secondary member can access the config database, the shard’s data will be accessible after an election. Administrators will need to resolve the chunk migration failure independently.
If you encounter this issue, contact the MongoDB User Group or 10gen support to address this issue.
This document answers common questions about database replication in MongoDB.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
Frequently Asked Questions:
MongoDB supports master-slave replication and a variation on master-slave replication known as replica sets. Replica sets are the recommended replication topology.
Primary and master nodes are the nodes that can accept writes. MongoDB’s replication is “single-master:” only one node can accept write operations at a time.
In a replica set, if a the current “primary” node fails or becomes inaccessible, the other members can autonomously elect one of the other members of the set to be the new “primary”.
By default, clients send all reads to the primary; however, read preference is configurable at the client level on a per-connection basis, which makes it possible to send reads to secondary nodes instead.
Secondary and slave nodes are read-only nodes that replicate from the primary.
Replication operates by way of an oplog, from which secondary/slave members apply new operations to themselves. This replication process is asynchronous, so secondary/slave nodes may not always reflect the latest writes to the primary. But usually, the gap between the primary and secondary nodes is just few milliseconds on a local network connection.
It varies, but a replica set will select a new primary within a minute.
It may take 10-30 seconds for the members of a replica set to declare a primary inaccessible. This triggers an election. During the election, the cluster is unavailable for writes.
The election itself may take another 10-30 seconds.
Note
Eventually consistent reads, like the ones that will return from a replica set are only possible with a write concern that permits reads from secondary members.
Yes.
For example, a deployment may maintain a primary and secondary in an East-coast data center along with a secondary member for disaster recovery in a West-coast data center.
Yes, but not without connection failures and the obvious latency.
Members of the set will attempt to reconnect to the other members of the set in response to networking flaps. This does not require administrator intervention. However, if the network connections between the nodes in the replica set are very slow, it might not be possible for the members of the node to keep up with the replication.
If the TCP connection between the secondaries and the primary instance breaks, a replica set the set will automatically elect one of the secondary members of the set as primary.
New in version 1.8.
Replica sets are the preferred replication mechanism in MongoDB. However, if your deployment requires more than 12 nodes, you must use master/slave replication.
Deprecated since version 1.6.
Replica sets replaced replica pairs in version 1.6. Replica sets are the preferred replication mechanism in MongoDB.
Journaling facilitates faster crash recovery. Prior to journaling, crashes often required database repairs or full data resync. Both were slow, and the first was unreliable.
Journaling is particularly useful for protection against power failures, especially if your replica set resides in a single data center or power circuit.
When a replica set runs with journaling, mongod instances can safely restart without any administrator intervention.
Note
Journaling requires some resource overhead for write operations. Journaling has no effect on read performance, however.
Journaling is enabled by default on all 64-bit builds of MongoDB v2.0 and greater.
Yes.
However, if you want confirmation that a given write has arrived at the server, use write concern. The getLastError command provides the facility for write concern. However, after the default write concern change, the default write concern acknowledges all write operations, and unacknowledged writes must be explicitly configured. See the Drivers documentation for your driver for more information.
Some configurations do not require any arbiter instances. Arbiters vote in elections for primary but do not replicate the data like secondary members.
Replica sets require a majority of the original nodes present to elect a primary. Arbiters allow you to construct this majority without the overhead of adding replicating nodes to the system.
There are many possible replica set architectures.
If you have a three node replica set, you don’t need an arbiter.
But a common configuration consists of two replicating nodes, one of which is primary and the other is secondary, as well as an arbiter for the third node. This configuration makes it possible for the set to elect a primary in the event of a failure without requiring three replicating nodes.
You may also consider adding an arbiter to a set if it has an equal number of nodes in two facilities and network partitions between the facilities are possible. In these cases, the arbiter will break the tie between the two facilities and allow the set to elect a new primary.
See also
Arbiters never receive the contents of a collection but do exchange the following data with the rest of the replica set:
If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation for Using MongoDB with SSL Connections for more information. Run all arbiters on secure networks, as with all MongoDB components.
See
The overview of Arbiter Members of Replica Sets.
All members of a replica set, unless the value of votes is equal to 0, vote in elections. This includes all delayed, hidden and secondary-only members, as well as the arbiters.
See also
Yes.
Factors including: different oplog sizes, different levels of storage fragmentation, and MongoDB’s data file pre-allocation can lead to some variation in storage utilization between nodes. Storage use disparities will be most pronounced when you add members at different times.
This document addresses common questions regarding MongoDB’s storage system.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
Frequently Asked Questions:
A memory-mapped file is a file with data that the operating system places in memory by way of the mmap() system call. mmap() thus maps the file to a region of virtual memory. Memory-mapped files are the critical piece of the storage engine in MongoDB. By using memory mapped files MongoDB can treat the content of its data files as if they were in memory. This provides MongoDB with an extremely fast and simple method for accessing and manipulating data.
Memory mapping assigns files to a block of virtual memory with a direct byte-for-byte correlation. Once mapped, the relationship between file and memory allows MongoDB to interact with the data in the file as if it were memory.
MongoDB uses memory mapped files for managing and interacting with all data. MongoDB memory maps data files to memory as it accesses documents. Data that isn’t accessed is not mapped to memory.
Page faults will occur if you’re attempting to access part of a memory-mapped file that isn’t in memory.
If there is free memory, then the operating system can find the page on disk and load it to memory directly. However, if there is no free memory, the operating system must:
This process, particularly on an active system can take a long time, particularly in comparison to reading a page that is already in memory.
Page faults occur when MongoDB needs access to data that isn’t currently in active memory. A “hard” page fault refers to situations when MongoDB must access a disk to access the data. A “soft” page fault, by contrast, merely moves memory pages from one list to another, such as from an operating system file cache. In production, MongoDB will rarely encounter soft page faults.
The db.stats() method in the mongo shell, returns the current state of the “active” database. The Database Statistics Reference document outlines the meaning of the fields in the db.stats() output.
Working set represents the total body of data that the application uses in the course of normal operation. Often this is a subset of the total data size, but the specific size of the working set depends on actual moment-to-moment use of the database.
If you run a query that requires MongoDB to scan every document in a collection, the working set will expand to include every document. Depending on physical memory size, this may cause documents in the working set to “page out,” or removed from physical memory by the operating system. The next time MongoDB needs to access these documents, MongoDB may incur a hard page fault.
If you run a query that requires MongoDB to scan every document in a collection, the working set includes every active document in memory.
For best performance, the majority of your active set should fit in RAM.
This document addresses common questions regarding MongoDB indexes.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List. See also Indexing Strategies.
Frequently Asked Questions:
No. You only need to create an index once for a single collection. After initial creation, MongoDB automatically updates the index as data changes.
While running ensureIndex() is usually ok, if an index doesn’t exist because of ongoing administrative work, a call to ensureIndex() may disrupt database availability. Running ensureIndex() can render a replica set inaccessible as the index creation is happening. See Build Indexes on Replica Sets.
To list a collection’s indexes, use the db.collection.getIndexes() method or a similar method for your driver.
To check the sizes of the indexes on a collection, use db.collection.stats().
When an index is too large to fit into RAM, MongoDB must read the index from disk, which is a much slower operation than reading from RAM. Keep in mind an index fits into RAM when your server has RAM available for the index combined with the rest of the working set.
In certain cases, an index does not need to fit entirely into RAM. For details, see Indexes that Hold Only Recent Values in RAM.
To inspect how MongoDB processes a query, use the explain() method in the mongo shell, or in your application driver.
A number of factors determine what fields to index, including selectivity, fitting indexes into RAM, reusing indexes in multiple queries when possible, and creating indexes that can support all the fields in a given query. For detailed documentation on choosing which fields to index, see Indexing Strategies.
Any write operation that alters an indexed field requires an update to the index in addition to the document itself. If you update a document that causes the document to grow beyond the allotted record size, then MongoDB must update all indexes that include this document as part of the update operation.
Therefore, if your application is write-heavy, creating too many indexes might affect performance.
Building an index can be an IO-intensive operation, especially if you have a large collection. This is true on any database system that supports secondary indexes, including MySQL. If you need to build an index on a large collection, consider building the index in the background. See Index Creation Options.
If you build a large index without the background option, and if doing so causes the database to stop responding, wait for the index to finish building.
You can use the min() and max() methods to constrain the results of the cursor returned from find() by using index keys.
The $ne and $nin operators are not selective. See Create Queries that Ensure Selectivity. If you need to use these, it is often best to make sure that an additional, more selective criterion is part of the query.
Not entirely. The index can partially support these queries because it can speed the selection of the first element of the array; however, comparing all subsequent items in the array cannot use the index and must scan the documents individually.
For simple attribute lookups that don’t require sorted result sets or range queries, consider creating a field that contains an array of documents where each document has a field (e.g. attrib ) that holds a specific type of attribute. You can index this attrib field.
For example, the attrib field in the following document allows you to add an unlimited number of attributes types:
{ _id : ObjectId(...),
attrib : [
{ k: "color", v: "red" },
{ k: "shape": v: "rectangle" },
{ k: "color": v: "blue" },
{ k: "avail": v: true }
]
}
Both of the following queries could use the same { "attrib.k": 1, "attrib.v": 1 } index:
db.mycollection.find( { attrib: { $elemMatch : { k: "color", v: "blue" } } } )
db.mycollection.find( { attrib: { $elemMatch : { k: "avail", v: true } } } )
Query and update operators:
The $addToSet operator adds a value to an array only if the value is not in the array already. If the value is in the array, $addToSet returns without modifying the array. Otherwise, $addToSet behaves the same as $push. Consider the following example:
db.collection.update( { field: value }, { $addToSet: { field: value1 } } );
Here, $addToSet appends value1 to the array stored in field, only if value1 is not already a member of this array.
$addToSet see the documentation of $addToSet for more information.
The $each operator is available within the $addToSet, which allows you to add multiple values to the array if they do not exist in the field array in a single operation. Consider the following prototype:
db.collection.update( { field: value }, { $addToSet: { field: { $each : [ value1, value2, value3 ] } } } );
Syntax: { field: { $all: [ <value> , <value1> ... ] }
$all selects the documents where the field holds an array and contains all elements (e.g. <value>, <value1>, etc.) in the array.
Consider the following example:
db.inventory.find( { tags: { $all: [ "appliances", "school", "book" ] } } )
This query selects all documents in the inventory collection where the tags field contains an array with the elements, appliances, school, and technology.
Therefore, the above query will match documents in the inventory collection that have a tags field that hold either of the following arrays:
[ "school", "book", "bag", "headphone", "appliances" ]
[ "appliances", "school", "book" ]
The $all operator exists to describe and specify arrays in MongoDB queries. However, you may use the $all operator to select against a non-array field, as in the following example:
db.inventory.find( { qty: { $all: [ 50 ] } } )
However, use the following form to express the same query:
db.inventory.find( { qty: 50 } )
Both queries will select all documents in the inventory collection where the value of the qty field equals 50.
Note
In most cases, MongoDB does not treat arrays as sets. This operator provides a notable exception to this approach.
In the current release queries that use the $all operator must scan all the documents that match the first element in the query array. As a result, even with an index to support the query, the operation may be long running, particularly when the first element in the array is not very selective.
New in version 2.0.
Syntax: { $and: [ { <expression1> }, { <expression2> } , ... , { <expressionN> } ] }
$and performs a logical AND operation on an array of two or more expressions (e.g. <expression1>, <expression2>, etc.) and selects the documents that satisfy all the expressions in the array. The $and operator uses short-circuit evaluation. If the first expression (e.g. <expression1>) evaluates to false, MongoDB will not evaluate the remaining expressions.
Consider the following example:
db.inventory.find({ $and: [ { price: 1.99 }, { qty: { $lt: 20 } }, { sale: true } ] } )
This query will select all documents in the inventory collection where:
MongoDB provides an implicit AND operation when specifying a comma separated list of expressions. For example, you may write the above query as:
db.inventory.find( { price: 1.99, qty: { $lt: 20 } , sale: true } )
If, however, a query requires an AND operation on the same field such as { price: { $ne: 1.99 } } AND { price: { $exists: true } }, then either use the $and operator for the two separate expressions or combine the operator expressions for the field { price: { $ne: 1.99, $exists: true } }.
Consider the following examples:
db.inventory.update( { $and: [ { price: { $ne: 1.99 } }, { price: { $exists: true } } ] }, { $set: { qty: 15 } } )
db.inventory.update( { price: { $ne: 1.99, $exists: true } } , { $set: { qty: 15 } } )
Both update() operations will set the value of the qty field in documents where:
$atomic isolation operator isolates a write operation that affect multiple documents from other write operations.
Note
The $atomic isolation operator does not provide “all-or-nothing” atomicity for write operations.
Consider the following example:
db.foo.update( { field1 : 1 , $atomic : 1 }, { $inc : { field2 : 1 } } , { multi: true } )
Without the $atomic operator, multi-updates will allow other operations to interleave with this updates. If these interleaved operations contain writes, the update operation may produce unexpected results. By specifying $atomic you can guarantee isolation for the entire multi-update.
See also
See db.collection.update() for more information about the db.collection.update() method.
The $bit operator performs a bitwise update of a field. Only use this with integer fields. For example:
db.collection.update( { field: 1 }, { $bit: { field: { and: 5 } } } );
Here, the $bit operator updates the integer value of the field named field with a bitwise and: 5 operation. This operator only works with number types.
New in version 1.4.
The $box operator specifies a rectangular shape for the $within operator in geospatial queries. To use the $box operator, you must specify the bottom left and top right corners of the rectangle in an array object. Consider the following example:
db.collection.find( { loc: { $within: { $box: [ [0,0], [100,100] ] } } } )
This will return all the documents that are within the box having points at: [0,0], [0,100], [100,0], and [100,100].
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
New in version 1.4.
This specifies a circle shape for the $within operator in geospatial queries. To define the bounds of a query using $center, you must specify:
- the center point, and
- the radius
Considering the following example:
db.collection.find( { location: { $within: { $center: [ [0,0], 10 ] } } } );
The above command returns all the documents that fall within a 10 unit radius of the point [0,0].
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
New in version 1.8.
The $centerSphere operator is the spherical equivalent of the $center operator. $centerSphere uses spherical geometry to calculate distances in a circle specified by a point and radius.
Considering the following example:
db.collection.find( { loc: { $centerSphere: { [0,0], 10 / 3959 } } } )
This query will return all documents within a 10 mile radius of [0,0] using a spherical geometry to calculate distances.
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
The $comment makes it possible to attach a comment to a query. Because these comments propagate to the profile log, adding $comment modifiers can make your profile data much easier to interpret and trace. Use one of the following forms:
db.collection.find( { <query> } )._addSpecial( "$comment", <comment> )
db.collection.find( { $query: { <query> }, $comment: <comment> } )
Note
The $each operator is only used with the $addToSet see the documentation of $addToSet for more information.
The $each operator is available within the $addToSet, which allows you to add multiple values to the array if they do not exist in the field array in a single operation. Consider the following prototype:
db.collection.update( { field: value }, { $addToSet: { field: { $each : [ value1, value2, value3 ] } } } );
See also
New in version 1.4.
The $elemMatch operator matches more than one component within an array element. For example,
db.collection.find( { array: { $elemMatch: { value1: 1, value2: { $gt: 1 } } } } );
returns all documents in collection where the array array satisfies all of the conditions in the $elemMatch expression, or where the value of value1 is 1 and the value of value2 is greater than 1. Matching arrays must have at least one element that matches all specified criteria. Therefore, the following document would not match the above query:
{ array: [ { value1:1, value2:0 }, { value1:2, value2:2 } ] }
while the following document would match this query:
{ array: [ { value1:1, value2:0 }, { value1:1, value2:2 } ] }
Syntax: { field: { $exists: <boolean> } }
$exists selects the documents that contain the field if <boolean> is true. If <boolean> is false, the query only returns the documents that do not contain the field. Documents that contain the field but has the value null are not returned.
MongoDB $exists does not correspond to SQL operator exists. For SQL exists, refer to the $in operator.
Consider the following example:
db.inventory.find( { qty: { $exists: true, $nin: [ 5, 15 ] } } )
This query will select all documents in the inventory collection where the qty field exists and its value does not equal either 5 nor 15.
$explain operator provides information on the query plan. It returns a document that describes the process and indexes used to return the query. This may provide useful insight when attempting to optimize a query.
mongo shell also provides the explain() method:
db.collection.find().explain()
You can also specify the option in either of the following forms:
db.collection.find()._addSpecial( "$explain", 1 )
db.collection.find( { $query: {}, $explain: 1 } )
For details on the output, see Explain Output.
$explain runs the actual query to determine the result. Although there are some differences between running the query with $explain and running without, generally, the performance will be similar between the two. So, if the query is slow, the $explain operation is also slow.
Additionally, the $explain operation reevaluates a set of candidate query plans, which may cause the $explain operation to perform differently than a normal query. As a result, these operations generally provide an accurate account of how MongoDB would perform the query, but do not reflect the length of these queries.
To determine the performance of a particular index, you can use hint() and in conjunction with explain(), as in the following example:
db.products.find().hint( { type: 1 } ).explain()
When you run explain() with hint(), the query optimizer does not reevaluate the query plans.
Note
In some situations, the explain() operation may differ from the actual query plan used by MongoDB in a normal query.
The explain() operation evaluates the set of query plans and reports on the winning plan for the query. In normal operations the query optimizer caches winning query plans and uses them for similar related queries in the future. As a result MongoDB may sometimes select query plans from the cache that are different from the plan displayed using explain().
See also
Syntax: {field: {$gt: value} }
$gt selects those documents where the value of the field is greater than (i.e. >) the specified value.
Consider the following example:
db.inventory.find( { qty: { $gt: 20 } } )
This query will select all documents in the inventory collection where the qty field value is greater than 20.
Consider the following example which uses the $gt operator with a field from an embedded document:
db.inventory.update( { "carrier.fee": { $gt: 2 } }, { $set: { price: 9.99 } } )
This update() operation will set the value of the price field in the documents that contain the embedded document carrier whose fee field value is greater than 2.
Syntax: {field: {$gte: value} }
$gte selects the documents where the value of the field is greater than or equal to (i.e. >=) a specified value (e.g. value.)
Consider the following example:
db.inventory.find( { qty: { $gte: 20 } } )
This query would select all documents in inventory where the qty field value is greater than or equal to 20.
Consider the following example which uses the $gte operator with a field from an embedded document:
db.inventory.update( { "carrier.fee": { $gte: 2 } }, { $set: { price: 9.99 } } )
This update() operation will set the value of the price field that contain the embedded document carrier whose``fee`` field value is greater than or equal to 2.
The $hint operator forces the query optimizer to use a specific index to fulfill the query. Use $hint for testing query performance and indexing strategies. Consider the following form:
db.collection.find().hint( { age: 1 } )
This operation returns all documents in the collection named collection using the index on the age field. Use this operator to override MongoDB’s default index selection process and pick indexes manually.
You can also specify the option in either of the following forms:
db.collection.find()._addSpecial( "$hint", { age : 1 } )
db.collection.find( { $query: {}, $hint: { age : 1 } } )
Syntax: { field: { $in: [<value1>, <value2>, ... <valueN> ] } }
$in selects the documents where the field value equals any value in the specified array (e.g. <value1>, <value2>, etc.)
Consider the following example:
db.inventory.find( { qty: { $in: [ 5, 15 ] } } )
This query will select to select all documents in the inventory collection where the qty field value is either 5 or 15. Although you can express this query using the $or operator, choose the $in operator rather than the $or operator when performing equality checks on the same field.
If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (e.g. <value1>, <value2>, etc.)
Consider the following example:
db.inventory.update( { tags: { $in: ["appliances", "school"] } }, { $set: { sale:true } } )
This update() operation will set the sale field value in the inventory collection where the tags field holds an array with at least one element matching an element in the array ["appliances", "school"].
The $inc operator increments a value by a specified amount if field is present in the document. If the field does not exist, $inc sets field to the number value. For example:
db.collection.update( { field: value }, { $inc: { field1: amount } } );
In this example, for documents in collection where field has the value value, the value of field1 increments by the value of amount. The above operation only increments the first matching document unless you specify multi-update:
db.collection.update( { age: 20 }, { $inc: { age: 1 } } );
db.collection.update( { name: "John" }, { $inc: { age: 1 } } );
In the first example all documents that have an age field with the value of 20, the operation increases age field by one. In the second example, in all documents where the name field has a value of John the operation increases the value of the age field by one.
$inc accepts positive and negative incremental amounts.
Syntax: {field: {$lt: value} }
$lt selects the documents where the value of the field is less than (i.e. <) the specified value.
Consider the following example:
db.inventory.find( { qty: { $lt: 20 } } )
This query will select all documents in the inventory collection where the qty field value is less than 20.
Consider the following example which uses the $lt operator with a field from an embedded document:
db.inventory.update( { "carrier.fee": { $lt: 20 } }, { $set: { price: 9.99 } } )
This update() operation will set the price field value in the documents that contain the embedded document carrier whose fee field value is less than 20.
Syntax: { field: { $lte: value} }
$lte selects the documents where the value of the field is less than or equal to (i.e. <=) the specified value.
Consider the following example:
db.inventory.find( { qty: { $lte: 20 } } )
This query will select all documents in the inventory collection where the qty field value is less than or equal to 20.
Consider the following example which uses the $lt operator with a field from an embedded document:
db.inventory.update( { "carrier.fee": { $lte: 5 } }, { $set: { price: 9.99 } } )
This update() operation will set the price field value in the documents that contain the embedded document carrier whose fee field value is less than or equal to 5.
Specify a $max value to specify the exclusive upper bound for a specific index in order to constrain the results of find(). The mongo shell provides the cursor.max() wrapper method:
db.collection.find( { <query> } ).max( { field1: <max value>, ... fieldN: <max valueN> } )
You can also specify the option with either of the two forms:
db.collection.find( { <query> } )._addSpecial( "$max", { field1: <max value1>, ... fieldN: <max valueN> } )
db.collection.find( { $query: { <query> }, $max: { field1: <max value1>, ... fieldN: <max valueN> } } )
The $max specifies the upper bound for all keys of a specific index in order.
Consider the following operations on a collection named collection that has an index { age: 1 }:
db.collection.find( { <query> } ).max( { age: 100 } )
This operation limits the query to those documents where the field age is less than 100 using the index { age: 1 }.
You can explicitly specify the corresponding index with cursor.hint(). Otherwise, MongoDB selects the index using the fields in the indexbounds; however, if multiple indexes exist on same fields with different sort orders, the selection of the index may be ambiguous.
Consider a collection named collection that has the following two indexes:
{ age: 1, type: -1 }
{ age: 1, type: 1 }
Without explicitly using cursor.hint(), MongoDB may select either index for the following operation:
db.collection.find().max( { age: 50, type: 'B' } )
Use $max alone or in conjunction with $min to limit results to a specific range for the same index, as in the following example:
db.collection.find().min( { age: 20 } ).max( { age: 25 } )
Note
Because cursor.max() requires an index on a field, and forces the query to use this index, you may prefer the $lt operator for the query if possible. Consider the following example:
db.collection.find( { _id: 7 } ).max( { age: 25 } )
The query uses the index on the age field, even if the index on _id may be better.
The $maxDistance operator specifies an upper bound to limit the results of a geolocation query. See below, where the $maxDistance operator narrows the results of the $near query:
db.collection.find( { location: { $near: [100,100], $maxDistance: 10 } } );
This query will return documents with location fields from collection that have values with a distance of 5 or fewer units from the point [100,100]. $near returns results ordered by their distance from [100,100]. This operation will return the first 100 results unless you modify the query with the cursor.limit() method.
Specify the value of the $maxDistance argument in the same units as the document coordinate system.
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
Constrains the query to only scan the specified number of documents when fulfilling the query. Use one of the following forms:
db.collection.find( { <query> } )._addSpecial( "$maxScan" , <number> )
db.collection.find( { $query: { <query> }, $maxScan: <number> } )
Use this modifier to prevent potentially long running queries from disrupting performance by scanning through too much data.
Specify a $min value to specify the inclusive lower bound for a specific index in order to constrain the results of find(). The mongo shell provides the cursor.min() wrapper method:
db.collection.find( { <query> } ).min( { field1: <min value>, ... fieldN: <min valueN>} )
You can also specify the option with either of the two forms:
db.collection.find( { <query> } )._addSpecial( "$min", { field1: <min value1>, ... fieldN: <min valueN> } )
db.collection.find( { $query: { <query> }, $min: { field1: <min value1>, ... fieldN: <min valueN> } } )
The $min specifies the lower bound for all keys of a specific index in order.
Consider the following operations on a collection named collection that has an index { age: 1 }:
db.collection.find().min( { age: 20 } )
These operations limit the query to those documents where the field age is at least 20 using the index { age: 1 }.
You can explicitly specify the corresponding index with cursor.hint(). Otherwise, MongoDB selects the index using the fields in the indexbounds; however, if multiple indexes exist on same fields with different sort orders, the selection of the index may be ambiguous.
Consider a collection named collection that has the following two indexes:
{ age: 1, type: -1 }
{ age: 1, type: 1 }
Without explicitly using cursor.hint(), it is unclear which index the following operation will select:
db.collection.find().min( { age: 20, type: 'C' } )
You can use $min in conjunction with $max to limit results to a specific range for the same index, as in the following example:
db.collection.find().min( { age: 20 } ).max( { age: 25 } )
Note
Because cursor.min() requires an index on a field, and forces the query to use this index, you may prefer the $gte operator for the query if possible. Consider the following example:
db.collection.find( { _id: 7 } ).min( { age: 25 } )
The query will use the index on the age field, even if the index on _id may be better.
Syntax: { field: { $mod: [ divisor, remainder ]} }
$mod selects the documents where the field value divided by the divisor has the specified remainder.
Consider the following example:
db.inventory.find( { qty: { $mod: [ 4, 0 ] } } )
This query will select all documents in the inventory collection where the qty field value modulo 4 equals 0, such as documents with qty value equal to 0 or 12.
In some cases, you can query using the $mod operator rather than the more expensive $where operator. Consider the following example using the $mod operator:
db.inventory.find( { qty: { $mod: [ 4, 0 ] } } )
The above query is less expensive than the following query which uses the $where operator:
db.inventory.find( { $where: "this.qty % 4 == 0" } )
Use the $natural operator to use natural order for the results of a sort operation. Natural order refers to the order of documents in the file on disk.
The $natural operator uses the following syntax to return documents in the order they exist on disk:
db.collection.sort( { $natural: 1 } )
Use -1 to return documents in the reverse order as they occur on disk:
db.collection.sort( { $natural: -1 } )
See also
Syntax: {field: {$ne: value} }
$ne selects the documents where the value of the field is not equal (i.e. !=) to the specified value. This includes documents that do not contain the field.
Consider the following example:
db.inventory.find( { qty: { $ne: 20 } } )
This query will select all documents in the inventory collection where the qty field value does not equal 20, including those documents that do not contain the qty field.
Consider the following example which uses the $ne operator with a field from an embedded document:
db.inventory.update( { "carrier.state": { $ne: "NY" } }, { $set: { qty: 20 } } )
This update() operation will set the qty field value in the documents that contains the embedded document carrier whose state field value does not equal “NY”, or where the state field or the carrier embedded document does not exist.
The $near operator takes an argument, coordinates in the form of [x, y], and returns a list of objects sorted by distance from those coordinates. See the following example:
db.collection.find( { location: { $near: [100,100] } } );
This query will return 100 ordered records with a location field in collection. Specify a different limit using the cursor.limit(), or another geolocation operator, or a non-geospatial operator to limit the results of the query.
Note
Specifying a batch size (i.e. batchSize()) in conjunction with queries that use the $near is not defined. See SERVER-5236 for more information.
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
New in version 1.8.
The $nearSphere operator is the spherical equivalent of the $near operator. $nearSphere returns all documents near a point, calculating distances using spherical geometry.
db.collection.find( { loc: { $nearSphere: [0,0] } } )
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
Syntax: { field: { $nin: [ <value1>, <value2> ... <valueN> ]} }
$nin selects the documents where:
Consider the following query:
db.inventory.find( { qty: { $nin: [ 5, 15 ] } } )
This query will select all documents in the inventory collection where the qty field value does not equal 5 nor 15. The selected documents will include those documents that do not contain the qty field.
If the field holds an array, then the $nin operator selects the documents whose field holds an array with no element equal to a value in the specified array (e.g. <value1>, <value2>, etc.).
Consider the following query:
db.inventory.update( { tags: { $nin: [ "appliances", "school" ] } }, { $set: { sale: false } } )
This update() operation will set the sale field value in the inventory collection where the tags field holds an array with no elements matching an element in the array ["appliances", "school"] or where a document does not contain the tags field.
Syntax: { $nor: [ { <expression1> }, { <expression2> }, ... { <expressionN> } ] }
$nor performs a logical NOR operation on an array of two or more <expressions> and selects the documents that fail all the <expressions> in the array.
Consider the following example:
db.inventory.find( { $nor: [ { price: 1.99 }, { qty: { $lt: 20 } }, { sale: true } ] } )
This query will select all documents in the inventory collection where:
including those documents that do not contain these field(s).
The exception in returning documents that do not contain the field in the $nor expression is when the $nor operator is used with the $exists operator.
Consider the following query which uses only the $nor operator:
db.inventory.find( { $nor: [ { price: 1.99 }, { sale: true } ] } )
This query will return all documents that:
Compare that with the following query which uses the $nor operator with the $exists operator:
db.inventory.find( { $nor: [ { price: 1.99 }, { price: { $exists: false } },
{ sale: true }, { sale: { $exists: false } } ] } )
This query will return all documents that:
Syntax: { field: { $not: { <operator-expression> } } }
$not performs a logical NOT operation on the specified <operator-expression> and selects the documents that do not match the <operator-expression>. This includes documents that do not contain the field.
Consider the following query:
db.inventory.find( { price: { $not: { $gt: 1.99 } } } )
This query will select all documents in the inventory collection where:
{ $not: { $gt: 1.99 } } is different from the $lte operator. { $lt: 1.99 } returns only the documents where price field exists and its value is less than or equal to 1.99.
Remember that the $not operator only affects other operators and cannot check fields and documents independently. So, use the $not operator for logical disjunctions and the $ne operator to test the contents of fields directly.
Consider the following behaviors when using the $not operator:
The operation of the $not operator is consistent with the behavior of other operators but may yield unexpected results with some data types like arrays.
The $not operator does not support operations with the $regex operator. Instead use // or in your driver interfaces, use your language’s regular expression capability to create regular expression objects.
Consider the following example which uses the pattern match expression //:
db.inventory.find( { item: { $not: /^p.*/ } } )
The query will select all documents in the inventory collection where the item field value does not start with the letter p.
If using PyMongo’s re.compile(), you can write the above query as:
import re
for noMatch in db.inventory.find( { "item": { "$not": re.compile("^p.*") } } ):
print noMatch
New in version 1.6.
Changed in version 2.0: You may nest $or operations; however, these expressions are not as efficiently optimized as top-level.
Syntax: { $or: [ { <expression1> }, { <expression2> }, ... , { <expressionN> } ] }
The $or operator performs a logical OR operation on an array of two or more <expressions> and selects the documents that satisfy at least one of the <expressions>.
Consider the following query:
db.inventory.find( { price:1.99, $or: [ { qty: { $lt: 20 } }, { sale: true } ] } )
This query will select all documents in the inventory collection where:
Consider the following example which uses the $or operator to select fields from embedded documents:
db.inventory.update( { $or: [ { price:10.99 }, { "carrier.state": "NY"} ] }, { $set: { sale: true } } )
This update() operation will set the value of the sale field in the documents in the inventory collection where:
When using $or with <expressions> that are equality checks for the value of the same field, choose the $in operator over the $or operator.
Consider the query to select all documents in the inventory collection where:
The most effective query would be:
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ], qty: { $in: [20, 50] } } )
Consider the following behaviors when using the $or operator:
When using indexes with $or queries, remember that each clause of an $or query will execute in parallel. These clauses can each use their own index. Consider the following query:
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ] } )
For this query, you would create one index on price ( db.inventory.ensureIndex( { price: 1 } ) ) and another index on sale ( db.inventory.ensureIndex( { sale: 1 } ) ) rather than a compound index.
Also, when using the $or operator with the sort() method in a query, the query will not use the indexes on the $or fields. Consider the following query which adds a sort() method to the above query:
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ] } ).sort({item:1})
This modified query will not use the index on price nor the index on sale.
You cannot use the $or with 2d geospatial queries.
The $orderby operator sorts the results of a query in ascending or descending order.
The mongo shell provides the cursor.sort() method:
db.collection.find().sort( { age: -1 } )
You can also specify the option in either of the following forms:
db.collection.find()._addSpecial( "$orderby", { age : -1 } )
db.collection.find( { $query: {}, $orderby: { age : -1 } } )
These examples return all documents in the collection named collection sorted by the age field in descending order. Specify a value to $orderby of negative one (e.g. -1, as above) to sort in descending order or a positive value (e.g. 1) to sort in ascending order.
Unless you have a index for the specified key pattern, use $orderby in conjunction with $maxScan and/or cursor.limit() to avoid requiring MongoDB to perform a large in-memory sort. The cursor.limit() increases the speed and reduces the amount of memory required to return this query by way of an optimized algorithm.
New in version 1.9.
Use $polygon to specify a polygon for a bounded query using the $within operator for geospatial queries. To define the polygon, you must specify an array of coordinate points, as in the following:
[ [ x1,y1 ], [x2,y2], [x3,y3] ]
The last point specified is always implicitly connected to the first. You can specify as many points, and therefore sides, as you like. Consider the following bounded query for documents with coordinates within a polygon:
db.collection.find( { loc: { $within: { $polygon: [ [0,0], [3,6], [6,0] ] } } } )
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
The $pop operator removes the first or last element of an array. Pass $pop a value of 1` to remove the last element in an array and a value of -1 to remove the first element of an array. Consider the following syntax:
db.collection.update( {field: value }, { $pop: { field: 1 } } );
This operation removes the last item of the array in field in the document that matches the query statement { field: value }. The following example removes the first item of the same array:
db.collection.update( {field: value }, { $pop: { field: -1 } } );
Be aware of the following $pop behaviors:
New in version 1.1.
Syntax: { "<array>.$" : value }
The positional $ operator identifies an element in an array field to update without explicitly specifying the position of the element in the array. The positional $ operator, when used with the update() method and acts as a placeholder for the first match of the update query selector:
db.collection.update( { <query selector> }, { <update operator>: { "array.$" : value } } )
The array field must appear as part of the query selector.
Consider the following collection students with the following documents:
{ "_id" : 1, "grades" : [ 80, 85, 90 ] }
{ "_id" : 2, "grades" : [ 88, 90, 92 ] }
{ "_id" : 3, "grades" : [ 85, 100, 90 ] }
To update 80 to 82 in the grades array in the first document, use the positional $ operator if you do not know the position of the element in the array:
db.students.update( { _id: 1, grades: 80 }, { $set: { "grades.$" : 82 } } )
Remember that the positional $ operator acts as a placeholder for the first match of the update query selector.
The positional $ operator facilitates updates to arrays that contain embedded documents. Use the positional $ operator to access the fields in the embedded documents with the dot notation on the $ operator.
db.collection.update( { <query selector> }, { <update operator>: { "array.$.field" : value } } )
Consider the following document in the students collection whose grades field value is an array of embedded documents:
{ "_id" : 4, "grades" : [ { grade: 80, mean: 75, std: 8 },
{ grade: 85, mean: 90, std: 5 },
{ grade: 90, mean: 85, std: 3 } ] }
Use the positional $ operator to update the value of the std field in the embedded document with the grade of 85:
db.students.update( { _id: 4, "grades.grade": 85 }, { $set: { "grades.$.std" : 6 } } )
Consider the following behaviors when using the positional $ operator:
The $pull operator removes all instances of a value from an existing array. Consider the following example:
db.collection.update( { field: value }, { $pull: { field: value1 } } );
$pull removes the value value1 from the array in field, in the document that matches the query statement { field: value } in collection. If value1 existed multiple times in the field array, $pull would remove all instances of value1 in this array.
The $pullAll operator removes multiple values from an existing array. $pullAll provides the inverse operation of the $pushAll operator. Consider the following example:
db.collection.update( { field: value }, { $pullAll: { field1: [ value1, value2, value3 ] } } );
Here, $pullAll removes [ value1, value2, value3 ] from the array in field1, in the document that matches the query statement { field: value } in collection.
The $push operator appends a specified value to an array. For example:
db.collection.update( { field: value }, { $push: { field: value1 } } );
Here, $push appends value1 to the array identified by value in field. Be aware of the following behaviors:
The $pushAll operator is similar to the $push but adds the ability to append several values to an array at once.
db.collection.update( { field: value }, { $pushAll: { field1: [ value1, value2, value3 ] } } );
Here, $pushAll appends the values in [ value1, value2, value3 ] to the array in field1 in the document matched by the statement { field: value } in collection.
If you specify a single value, $pushAll will behave as $push.
The $query operator provides an interface to describe queries. Consider the following operation:
db.collection.find( { $query: { age : 25 } } )
This is equivalent to the following db.collection.find() method that may be more familiar to you:
db.collection.find( { age : 25 } )
These operations return only those documents in the collection named collection where the age field equals 25.
The $regex operator provides regular expression capabilities in queries. MongoDB uses Perl compatible regular expressions (i.e. “PCRE.”))The following examples are equivalent:
db.collection.find( { field: /acme.*corp/i } );
db.collection.find( { field: { $regex: 'acme.*corp', $options: 'i' } } );
These expressions match all documents in collection where the value of field matches the case-insensitive regular expression acme.*corp.
$regex uses “Perl Compatible Regular Expressions” (PCRE) as the matching engine.
$regex provides four option flags:
i toggles case insensitivity, and allows all letters in the pattern to match upper and lower cases.
m toggles multiline regular expression. Without this option, all regular expression match within one line.
If there are no newline characters (e.g. \n) or no start/end of line construct, the m option has no effect.
x toggles an “extended” capability. When set, $regex ignores all white space characters unless escaped or included in a character class.
Additionally, it ignores characters between an un-escaped # character and the next new line, so that you may include comments in complicated patterns. This only applies to data characters; white space characters may never appear within special character sequences in a pattern.
The x option does not affect the handling of the VT character (i.e. code 11.)
New in version 1.9.0.
$regex only provides the i and m options in the short JavaScript syntax (i.e. /acme.*corp/i). To use x and s you must use the “$regex” operator with the “$options” syntax.
To combine a regular expression match with other operators, you need to specify the “$regex” operator. For example:
db.collection.find( { field: $regex: /acme.*corp/i, $nin: [ 'acmeblahcorp' } );
This expression returns all instances of field in collection that match the case insensitive regular expression acme.*corp that don’t match acmeblahcorp.
$regex uses indexes only when the regular expression has an anchor for the beginning (i.e. ^) of a string. Additionally, while /^a/, /^a.*/, and /^a.*$/ are equivalent, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.
New in version 1.7.2.
Syntax: {$rename: { <old name1>: <new name1>, <old name2>: <new name2>, ... } }
The $rename operator updates the name of a field. The new field name must differ from the existing field name.
Consider the following example:
db.students.update( { _id: 1 }, { $rename: { 'nickname': 'alias', 'cell': 'mobile' } } )
This operation renames the field nickname to alias, and the field cell to mobile.
If the document already has a field with the new field name, the $rename operator removes that field and renames the field with the old field name to the new field name.
The $rename operator will expand arrays and sub-documents to find a match for field names. When renaming a field in a sub-document to another sub-document or to a regular field, the sub-document itself remains.
Consider the following examples involving the sub-document of the following document:
{ "_id": 1,
"alias": [ "The American Cincinnatus", "The American Fabius" ],
"mobile": "555-555-5555",
"nmae": { "first" : "george", "last" : "washington" }
}
To rename a sub-document, call the $rename operator with the name of the sub-document as you would any other field:
db.students.update( { _id: 1 }, { $rename: { "nmae": "name" } } )
This operation renames the sub-document nmae to name:
{ "_id": 1,
"alias": [ "The American Cincinnatus", "The American Fabius" ],
"mobile": "555-555-5555",
"name": { "first" : "george", "last" : "washington" }
}
To rename a field within a sub-document, call the $rename operator using the dot notation to refer to the field. Include the name of the sub-document in the new field name to ensure the field remains in the sub-document:
db.students.update( { _id: 1 }, { $rename: { "name.first": "name.fname" } } )
This operation renames the sub-document field first to fname:
{ "_id" : 1,
"alias" : [ "The American Cincinnatus", "The American Fabius" ],
"mobile" : "555-555-5555",
"name" : { "fname" : "george", "last" : "washington" }
}
To rename a field within a sub-document and move it to another sub-document, call the $rename operator using the dot notation to refer to the field. Include the name of the new sub-document in the new name:
db.students.update( { _id: 1 }, { $rename: { "name.last": "contact.lname" } } )
This operation renames the sub-document field last to lname and moves it to the sub-document contact:
{ "_id" : 1,
"alias" : [ "The American Cincinnatus", "The American Fabius" ],
"contact" : { "lname" : "washington" },
"mobile" : "555-555-5555",
"name" : { "fname" : "george" }
}
If the new field name does not include a sub-document name, the field moves out of the subdocument and becomes a regular document field.
Consider the following behavior when the specified old field name does not exist:
When renaming a single field and the existing field name refers to a non-existing field, the $rename operator does nothing, as in the following:
db.students.update( { _id: 1 }, { $rename: { 'wife': 'spouse' } } )
This operation does nothing because there is no field named wife.
When renaming multiple fields and all of the old field names refer to non-existing fields, the $rename operator does nothing, as in the following:
db.students.update( { _id: 1 }, { $rename: { 'wife': 'spouse',
'vice': 'vp',
'office': 'term' } } )
This operation does nothing because there are no fields named wife, vice, and office.
When renaming multiple fields and some but not all old field names refer to non-existing fields, the $rename operator performs the following operations:
Changed in version 2.2.
Consider the following query that renames both an existing field mobile and a non-existing field wife. The field named wife does not exist and $rename sets the field to a name that already exists alias.
db.students.update( { _id: 1 }, { $rename: { 'wife': 'alias',
'mobile': 'cell' } } )
This operation renames the mobile field to cell, and has no other impact action occurs.
{ "_id" : 1,
"alias" : [ "The American Cincinnatus", "The American Fabius" ],
"cell" : "555-555-5555",
"name" : { "lname" : "washington" },
"places" : { "d" : "Mt Vernon", "b" : "Colonial Beach" }
}
Note
Before version 2.2, when renaming multiple fields and only some (but not all) old field names refer to non-existing fields:
Consider the following operation that renames both the field mobile, which exists, and the field wife, which does not exist. The operation tries to set the field named wife to alias, which is the name of an existing field:
db.students.update( { _id: 1 }, { $rename: { 'wife': 'alias', 'mobile': 'cell' } } )
Before 2.2, the operation renames the field mobile to cell and drops the alias field even though the field wife does not exist:
{ "_id" : 1,
"cell" : "555-555-5555",
"name" : { "lname" : "washington" },
"places" : { "d" : "Mt Vernon", "b" : "Colonial Beach" }
}
Only return the index key or keys for the results of the query. If $returnKey is set to true and the query does not use an index to perform the read operation, the returned documents will not contain any fields. Use one of the following forms:
db.collection.find( { <query> } )._addSpecial( "$returnKey", true )
db.collection.find( { $query: { <query> }, $returnKey: true } )
Use the $set operator to set a particular value. The $set operator requires the following syntax:
db.collection.update( { field: value1 }, { $set: { field1: value2 } } );
This statement updates in the document in collection where field matches value1 by replacing the value of the field field1 with value2. This operator will add the specified field or fields if they do not exist in this document or replace the existing value of the specified field(s) if they already exist.
$showDiskLoc option adds a field $diskLoc to the returned documents. The $diskLoc field contains the disk location information.
The mongo shell provides the cursor.showDiskLoc() method:
db.collection.find().showDiskLoc()
You can also specify the option in either of the following forms:
db.collection.find( { <query> } )._addSpecial("$showDiskLoc" , true)
db.collection.find( { $query: { <query> }, $showDiskLoc: true } )
The $size operator matches any array with the number of elements specified by the argument. For example:
db.collection.find( { field: { $size: 2 } } );
returns all documents in collection where field is an array with 2 elements. For instance, the above expression will return { field: [ red, green ] } and { field: [ apple, lime ] } but not { field: fruit } or { field: [ orange, lemon, grapefruit ] }. To match fields with only one element within an array use $size with a value of 1, as follows:
db.collection.find( { field: { $size: 1 } } );
$size does not accept ranges of values. To select documents based on fields with different numbers of elements, create a counter field that you increment when you add elements to a field.
Queries cannot use indexes for the $size portion of a query, although the other portions of a query can use indexes if applicable.
The $snapshot operator prevents the cursor from returning a document more than once because an intervening write operation results in a move of the document.
Even in snapshot mode, objects inserted or deleted during the lifetime of the cursor may or may not be returned.
The mongo shell provides the cursor.snapshot() method:
db.collection.find().snapshot()
You can also specify the option in either of the following forms:
db.collection.find()._addSpecial( "$snapshot", true )
db.collection.find( { $query: {}, $snapshot: true } )
The $snapshot operator traverses the index on the _id field [1].
Warning
| [1] | You can achieve the $snapshot isolation behavior using any unique index on invariable fields. |
Syntax: { field: { $type: <BSON type> } }
$type selects the documents where the value of the field is the specified BSON type.
Consider the following example:
db.inventory.find( { price: { $type : 1 } } )
This query will select all documents in the inventory collection where the price field value is a Double.
If the field holds an array, the $type operator performs the type check against the array elements and not the field.
Consider the following example where the tags field holds an array:
db.inventory.find( { tags: { $type : 4 } } )
This query will select all documents in the inventory collection where the tags array contains an element that is itself an array.
If instead you want to determine whether the tags field is an array type, use the $where operator:
db.inventory.find( { $where : "Array.isArray(this.tags)" } )
See the SERVER-1475 for more information about the array type.
Refer to the following table for the available BSON types and their corresponding numbers.
Type |
Number |
Double |
1 |
String |
2 |
Object |
3 |
Array |
4 |
Binary data |
5 |
Object id |
7 |
Boolean |
8 |
Date |
9 |
Null |
10 |
Regular Expression |
11 |
JavaScript |
13 |
Symbol |
14 |
JavaScript (with scope) |
15 |
32-bit integer |
16 |
Timestamp |
17 |
64-bit integer |
18 |
Min key |
255 |
Max key |
127 |
MinKey and MaxKey compare less than and greater than all other possible BSON element values, respectively, and exist primarily for internal use.
Note
To query if a field value is a MinKey, you must use the $type with -1 as in the following example:
db.collection.find( { field: { $type: -1 } } )
Example
Consider the following example operation sequence that demonstrates both type comparison and the special MinKey and MaxKey values:
db.test.insert( {x : 3});
db.test.insert( {x : 2.9} );
db.test.insert( {x : new Date()} );
db.test.insert( {x : true } );
db.test.insert( {x : MaxKey } )
db.test.insert( {x : MinKey } )
db.test.find().sort({x:1})
{ "_id" : ObjectId("4b04094b7c65b846e2090112"), "x" : { $minKey : 1 } }
{ "_id" : ObjectId("4b03155dce8de6586fb002c7"), "x" : 2.9 }
{ "_id" : ObjectId("4b03154cce8de6586fb002c6"), "x" : 3 }
{ "_id" : ObjectId("4b031566ce8de6586fb002c9"), "x" : true }
{ "_id" : ObjectId("4b031563ce8de6586fb002c8"), "x" : "Tue Jul 25 2012 18:42:03 GMT-0500 (EST)" }
{ "_id" : ObjectId("4b0409487c65b846e2090111"), "x" : { $maxKey : 1 } }
To query for the minimum value of a shard key of a sharded cluster, use the following operation when connected to the mongos:
use config
db.chunks.find( { "min.shardKey": { $type: -1 } } )
Warning
Storing values of the different types in the same field in a collection is strongly discouraged.
New in version 2.0.
For geospatial queries, MongoDB may return a single document more than once for a single query, because geospatial indexes may include multiple coordinate pairs in a single document, and therefore return the same document more than once.
The $uniqueDocs operator inverts the default behavior of the $within operator. By default, the $within operator returns the document only once. If you specify a value of false for $uniqueDocs, MongoDB will return multiple instances of a single document.
Example
Given an addressBook collection with a document in the following form:
{ addresses: [ { name: "Home", loc: [55.5, 42.3] }, { name: "Work", loc: [32.3, 44.2] } ] }
The following query would return the same document multiple times:
db.addressBook.find( { "addresses.loc": { "$within": { "$box": [ [0,0], [100,100] ], $uniqueDocs: false } } } )
The following query would return each matching document, only once:
db.addressBook.find( { "address.loc": { "$within": { "$box": [ [0,0], [100,100] ], $uniqueDocs: true } } } )
You cannot specify $uniqueDocs with $near or haystack queries.
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
The $unset operator deletes a particular field. Consider the following example:
db.collection.update( { field: value1 }, { $unset: { field1: "" } } );
The above example deletes field1 in collection from documents where field has a value of value1. The value of specified for the value of the field in the $unset statement (i.e. "" above,) does not impact the operation.
If documents match the initial query (e.g. { field: value1 } above) but do not have the field specified in the $unset operation, (e.g. field1) there the statement has no effect on the document.
Use the $where operator to pass either a string containing a JavaScript expression or a full JavaScript function to the query system. The $where provides greater flexibility, but requires that the database processes the JavaScript expression or function for each document in the collection. Reference the document in the JavaScript expression or function using either this or obj .
Warning
Consider the following examples:
db.myCollection.find( { $where: "this.credits == this.debits" } );
db.myCollection.find( { $where: "obj.credits == obj.debits" } );
db.myCollection.find( { $where: function() { return (this.credits == this.debits) } } );
db.myCollection.find( { $where: function() { return obj.credits == obj.debits; } } );
Additionally, if the query consists only of the $where operator, you can pass in just the JavaScript expression or JavaScript functions, as in the following examples:
db.myCollection.find( "this.credits == this.debits || this.credits > this.debits" );
db.myCollection.find( function() { return (this.credits == this.debits || this.credits > this.debits ) } );
You can include both the standard MongoDB operators and the $where operator in your query, as in the following examples:
db.myCollection.find( { active: true, $where: "this.credits - this.debits < 0" } );
db.myCollection.find( { active: true, $where: function() { return obj.credits - obj.debits < 0; } } );
Using normal non-$where query statements provides the following performance advantages:
The $within operator allows you to select items that exist within a shape on a coordinate system for geospatial queries. This operator uses the following syntax:
db.collection.find( { location: { $within: { shape } } } );
Replace { shape } with a document that describes a shape. The $within command supports three shapes. These shapes and the relevant expressions follow:
Rectangles. Use the $box operator, consider the following variable and $within document:
db.collection.find( { location: { $within: { $box: [[100,0], [120,100]] } } } );
Here a box, [[100,120], [100,0]] describes the parameter for the query. As a minimum, you must specify the lower-left and upper-right corners of the box.
Circles. Use the $center operator. Specify circles in the following form:
db.collection.find( { location: { $within: { $center: [ center, radius } } } );
Polygons. Use the $polygon operator. Specify polygons with an array of points. See the following example:
db.collection.find( { location: { $within: { $polygon: [[100,120], [100,100], [120,100], [240,200]] } } } );
The last point of a polygon is implicitly connected to the first point.
All shapes include the border of the shape as part of the shape, although this is subject to the imprecision of floating point numbers.
Use $uniqueDocs to control whether documents with many location fields show up multiple times when more than one of its fields match the query.
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
New in version 1.4.
The $box operator specifies a rectangular shape for the $within operator in geospatial queries. To use the $box operator, you must specify the bottom left and top right corners of the rectangle in an array object. Consider the following example:
db.collection.find( { loc: { $within: { $box: [ [0,0], [100,100] ] } } } )
This will return all the documents that are within the box having points at: [0,0], [0,100], [100,0], and [100,100].
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
New in version 1.9.
Use $polygon to specify a polygon for a bounded query using the $within operator for geospatial queries. To define the polygon, you must specify an array of coordinate points, as in the following:
[ [ x1,y1 ], [x2,y2], [x3,y3] ]
The last point specified is always implicitly connected to the first. You can specify as many points, and therefore sides, as you like. Consider the following bounded query for documents with coordinates within a polygon:
db.collection.find( { loc: { $within: { $polygon: [ [0,0], [3,6], [6,0] ] } } } )
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
New in version 1.4.
This specifies a circle shape for the $within operator in geospatial queries. To define the bounds of a query using $center, you must specify:
- the center point, and
- the radius
Considering the following example:
db.collection.find( { location: { $within: { $center: [ [0,0], 10 ] } } } );
The above command returns all the documents that fall within a 10 unit radius of the point [0,0].
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
New in version 2.0.
For geospatial queries, MongoDB may return a single document more than once for a single query, because geospatial indexes may include multiple coordinate pairs in a single document, and therefore return the same document more than once.
The $uniqueDocs operator inverts the default behavior of the $within operator. By default, the $within operator returns the document only once. If you specify a value of false for $uniqueDocs, MongoDB will return multiple instances of a single document.
Example
Given an addressBook collection with a document in the following form:
{ addresses: [ { name: "Home", loc: [55.5, 42.3] }, { name: "Work", loc: [32.3, 44.2] } ] }
The following query would return the same document multiple times:
db.addressBook.find( { "addresses.loc": { "$within": { "$box": [ [0,0], [100,100] ], $uniqueDocs: false } } } )
The following query would return each matching document, only once:
db.addressBook.find( { "address.loc": { "$within": { "$box": [ [0,0], [100,100] ], $uniqueDocs: true } } } )
You cannot specify $uniqueDocs with $near or haystack queries.
Note
A geospatial index must exist on a field holding coordinates before using any of the geolocation query operators.
Projection operators:
See also
New in version 2.2.
Use the $elemMatch projection operator to limit the response of a query to a single matching element of an array. Consider the following:
Example
Given the following document fragment:
{
_id: ObjectId(),
zipcode: 63109,
dependents: [
{ name: "john", school: 102, age: 10 },
{ name: "jess", school: 102, age: 11 },
{ name: "jeff", school: 108, age: 15 }
]
}
Consider the following find() operation:
var projection = { _id: 0, dependents: { $elemMatch: { school: 102 }}};
db.students.find( { zipcode: 63109 }, projection);
The query would return all documents where the value of the zipcode field is 63109, while the projection excludes the _id field and only includes the first matching element of the dependents array where the school element has a value of 102. The documents would take the following form:
{
dependents: [
{ name: "john", school: 102, age: 10 }
]
}
Note
The $elemMatch projection will only match one array element per source document.
The $slice operator controls the number of items of an array that a query returns. Consider the following prototype query:
db.collection.find( { field: value }, { array: {$slice: count } } );
This operation selects the document collection identified by a field named field that holds value and returns the number of elements specified by the value of count from the array stored in the array field. If count has a value greater than the number of elements in array the query returns all elements of the array.
$slice accepts arguments in a number of formats, including negative values and arrays. Consider the following examples:
db.posts.find( {}, { comments: { $slice: 5 } } )
Here, $slice selects the first five items in an array in the comments field.
db.posts.find( {}, { comments: { $slice: -5 } } )
This operation returns the last five items in array.
The following examples specify an array as an argument to slice. Arrays take the form of [ skip , limit ], where the first value indicates the number of items in the array to skip and the second value indicates the number of items to return.
db.posts.find( {}, { comments: { $slice: [ 20, 10 ] } } )
Here, the query will only return 10 items, after skipping the first 20 items of that array.
db.posts.find( {}, { comments: { $slice: [ -20, 10 ] } } )
This operation returns 10 items as well, beginning with the item that is 20th from the last item of the array.
Aggregation operators:
Takes an array of one or more numbers and adds them together, returning the sum.
Returns an array of all the values found in the selected field among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.
Returns the average of all the values of the field in all documents selected by this group.
Takes two values in an array and returns an integer. The returned value is:
Use the $cond operator with the following syntax:
{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
Takes an array with three expressions, where the first expression evaluates to a Boolean value. If the first expression evaluates to true, $cond returns the value of the second expression. If the first expression evaluates to false, $cond evaluates and returns the third expression.
Takes a date and returns the day of the month as a number between 1 and 31.
Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)
Takes a date and returns the day of the year as a number between 1 and 366.
Takes an array that contains a pair of numbers and returns the value of the first number divided by the second number.
Takes two values in an array and returns a boolean. The returned value is:
Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.
The output of $group depends on how you define groups. Begin by specifying an identifier (i.e. a _id field) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields. Aggregate keys may resemble the following document:
{ _id : { author: '$author', pageViews: '$pageViews', posted: '$posted' } }
With the exception of the _id field, $group cannot output nested documents.
Every group expression must specify an _id field. You may specify the _id field as a dotted field path reference, a document with multiple fields enclosed in braces (i.e. { and }), or a constant value.
Consider the following example:
db.article.aggregate(
{ $group : {
_id : "$author",
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
This groups by the author field and computes two fields, the first docsPerAuthor is a counter field that adds one for each document with a given author field using the $sum function. The viewsPerAuthor field is the sum of all of the pageViews fields in the documents for each group.
Each field defined for the $group must use one of the group aggregation function listed below to generate its composite value:
Returns an array of all the values found in the selected field among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.
Returns the first value it encounters for its group .
Returns the last value it encounters for its group.
Returns the highest value among all values of the field in all documents selected by this group.
Returns the lowest value among all values of the field in all documents selected by this group.
Returns the average of all the values of the field in all documents selected by this group.
Returns an array of all the values found in the selected field among the documents in that group. A value may appear more than once in the result set if more than one field in the grouped documents has that value.
Returns the sum of all the values for a specified field in the grouped documents, as in the second use above.
Alternately, if you specify a value as an argument, $sum will increment this field by the specified value for every document in the grouping. Typically, as in the first use above, specify a value of 1 in order to count members of the group.
Warning
The aggregation system currently stores $group operations in memory, which may cause problems when processing a larger number of groups.
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array and returns an integer. The returned value is:
Restricts the number of documents that pass through the $limit in the pipeline.
$limit takes a single numeric (positive whole number) value as a parameter. Once the specified number of documents pass through the pipeline operator, no more will. Consider the following example:
db.article.aggregate(
{ $limit : 5 }
);
This operation returns only the first 5 documents passed to it from by the pipeline. $limit has no effect on the content of the documents it passes.
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array and returns an integer. The returned value is:
Provides a query-like interface to filter documents out of the aggregation pipeline. The $match drops documents that do not match the condition from the aggregation pipeline, and it passes documents that match along the pipeline unaltered.
The syntax passed to the $match is identical to the query syntax. Consider the following prototype form:
db.article.aggregate(
{ $match : <match-predicate> }
);
The following example performs a simple field equality test:
db.article.aggregate(
{ $match : { author : "dave" } }
);
This operation only returns documents where the author field holds the value dave. Consider the following example, which performs a range test:
db.article.aggregate(
{ $match : { score : { $gt : 50, $lte : 90 } } }
);
Here, all documents return when the score field holds a value that is greater than 50 and less than or equal to 90.
Note
Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of later processing. If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() or db.collection.findOne().
Warning
You cannot use $where or geospatial operations in $match queries as part of the aggregation pipeline.
Returns the highest value among all values of the field in all documents selected by this group.
Returns the lowest value among all values of the field in all documents selected by this group.
Takes an array of one or more numbers and multiples them, returning the resulting product.
Takes two values in an array returns an integer. The returned value is:
Reshapes a document stream by renaming, adding, or removing fields. Also use $project to create computed values or sub-objects. Use $project to:
Use $project to quickly select the fields that you want to include or exclude from the response. Consider the following aggregation framework operation.
db.article.aggregate(
{ $project : {
title : 1 ,
author : 1 ,
}}
);
This operation includes the title field and the author field in the document that returns from the aggregation pipeline.
Note
The _id field is always included by default. You may explicitly exclude _id as follows:
db.article.aggregate(
{ $project : {
_id : 0 ,
title : 1 ,
author : 1
}}
);
Here, the projection excludes the _id field but includes the title and author fields.
Projections can also add computed fields to the document stream passing through the pipeline. A computed field can use any of the expression operators. Consider the following example:
db.article.aggregate(
{ $project : {
title : 1,
doctoredPageViews : { $add:["$pageViews", 10] }
}}
);
Here, the field doctoredPageViews represents the value of the pageViews field after adding 10 to the original field using the $add.
Note
You must enclose the expression that defines the computed field in braces, so that the expression is a valid object.
You may also use $project to rename fields. Consider the following example:
db.article.aggregate(
{ $project : {
title : 1 ,
page_views : "$pageViews" ,
bar : "$other.foo"
}}
);
This operation renames the pageViews field to page_views, and renames the foo field in the other sub-document as the top-level field bar. The field references used for renaming fields are direct expressions and do not use an operator or surrounding braces. All aggregation field references can use dotted paths to refer to fields in nested documents.
Finally, you can use the $project to create and populate new sub-documents. Consider the following example that creates a new object-valued field named stats that holds a number of values:
db.article.aggregate(
{ $project : {
title : 1 ,
stats : {
pv : "$pageViews",
foo : "$other.foo",
dpv : { $add:["$pageViews", 10] }
}
}}
);
This projection includes the title field and places $project into “inclusive” mode. Then, it creates the stats documents with the following fields:
Returns an array of all the values found in the selected field among the documents in that group. A value may appear more than once in the result set if more than one field in the grouped documents has that value.
Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.
Skips over the specified number of documents that pass through the $skip in the pipeline before passing all of the remaining input.
$skip takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specified number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:
db.article.aggregate(
{ $skip : 5 }
);
This operation skips the first 5 documents passed to it by the pipeline. $skip has no effect on the content of the documents it passes along the pipeline.
The $sort pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:
db.<collection-name>.aggregate(
{ $sort : { <sort-key> } }
);
This sorts the documents in the collection named <collection-name>, according to the key and specification in the { <sort-key> } document.
Specify the sort in a document with a field or fields that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:
db.users.aggregate(
{ $sort : { age : -1, posts: 1 } }
);
This operation sorts the documents in the users collection, in descending order according by the age field and then in ascending order according to the value in the posts field.
Note
The $sort cannot begin sorting documents until previous operators in the pipeline have returned all output.
$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators:
Warning
Unless the $sort operator can use an index, in the current release, the sort must fit within memory. This may cause problems when sorting large numbers of documents.
Takes in two strings. Returns a number. $strcasecmp is positive if the first string is “greater than” the second and negative if the first string is “less than” the second. $strcasecmp returns 0 if the strings are identical.
Note
$strcasecmp may not make sense when applied to glyphs outside the Roman alphabet.
$strcasecmp internally capitalizes strings before comparing them to provide a case-insensitive comparison. Use $cmp for a case sensitive comparison.
$substr takes a string and two numbers. The first number represents the number of bytes in the string to skip, and the second number specifies the number of bytes to return from the string.
Note
$substr is not encoding aware and if used improperly may produce a result string containing an invalid UTF-8 character sequence.
Takes an array that contains a pair of numbers and subtracts the second from the first, returning their difference.
Returns the sum of all the values for a specified field in the grouped documents, as in the second use above.
Alternately, if you specify a value as an argument, $sum will increment this field by the specified value for every document in the grouping. Typically, as in the first use above, specify a value of 1 in order to count members of the group.
Peels off the elements of an array individually, and returns a stream of documents. $unwind returns one document for every member of the unwound array within every source document. Take the following aggregation command:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
Note
The dollar sign (i.e. $) must proceed the field specification handed to the $unwind operator.
In the above aggregation $project selects (inclusively) the author, title, and tags fields, as well as the _id field implicitly. Then the pipeline passes the results of the projection to the $unwind operator, which will unwind the tags field. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags field with an array of 3 items.
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
A single document becomes 3 documents: each document is identical except for the value of the tags field. Each value of tags is one of the values in the original “tags” array.
Note
$unwind has the following behaviors:
Takes a date and returns the week of the year as a number between 0 and 53.
Weeks begin on Sundays, and week 1 begins with the first Sunday of the year. Days preceding the first Sunday of the year are in week 0. This behavior is the same as the “%U” operator to the strftime standard library function.
| Parameters: |
|
|---|
Use the addShard command to add a database instance or replica set to a sharded cluster. You must run this command when connected a mongos instance.
The command takes the following form:
{ addShard: "<hostname>:<port>" }
Example
db.runCommand({addShard: "mongodb0.example.net:27027"})
Replace <hostname>:<port> with the hostname and port of the database instance you want to add as a shard.
Warning
Do not use localhost for the hostname unless your configuration server is also running on localhost.
The optimal configuration is to deploy shards across replica sets. To add a shard on a replica set you must specify the name of the replica set and the hostname of at least one member of the replica set. You must specify at least one member of the set, but can specify all members in the set or another subset if desired. addShard takes the following form:
{ addShard: "replica-set/hostname:port" }
Example
db.runCommand( { addShard: "repl0/mongodb3.example.net:27327"} )
If you specify additional hostnames, all must be members of the same replica set.
Send this command to only one mongos instance, it will store shard configuration information in the config database.
Note
Specify a maxSize when you have machines with different disk capacities, or if you want to limit the amount of data on some shards.
The maxSize constraint prevents the balancer from migrating chunks to the shard when the value of mem.mapped exceeds the value of maxSize.
New in version 2.1.0.
aggregate implements the aggregation framework. Consider the following prototype form:
{ aggregate: "[collection]", pipeline: [pipeline] }
Where [collection] specifies the name of the collection that contains the data that you wish to aggregate. The pipeline argument holds an array that contains the specification for the aggregation operation. Consider the following example from the aggregation documentation.
db.runCommand(
{ aggregate : "article", pipeline : [
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : "$tags",
authors : { $addToSet : "$author" }
} }
] }
);
More typically this operation would use the aggregate helper in the mongo shell, and would resemble the following:
db.article.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : "$tags",
authors : { $addToSet : "$author" }
} }
);
For more aggregation documentation, please see:
| Parameters: |
|
|---|
applyOps provides a way to apply entries from an oplog created by replica set members and master instances in a master/slave deployment. applyOps is primarily an internal command to support sharding functionality, and has the following prototype form:
db.runCommand( { applyOps: [ <operations> ], preCondition: [ { ns: <namespace>, q: <query>, res: <result> } ] } )
applyOps applies oplog entries from the <operations> array, to the mongod instance. The preCondition array provides the ability to specify conditions that must be true in order to apply the oplog entry.
You can specify as many preCondition sets as needed. If you specify the ns option, applyOps will only apply oplog entries for the collection described by that namespace. You may also specify a query in the q field with a corresponding expected result in the res field that must match in order to apply the oplog entry.
Warning
This command obtains a global write lock and will block other operations until it has completed.
Clients use authenticate to authenticate a connection. When using the shell, use the command helper as follows:
db.authenticate( "username", "password" )
availableQueryOptions is an internal command that is only available on mongos instances.
The buildInfo command is an administrative command which returns a build summary for the current mongod.
{ buildInfo: 1 }
The information provided includes the following:
checkShardingIndex is an internal command that supports the sharding functionality.
The clone command clone a database from a remote MongoDB instance to the current host. clone copies the database on the remote instance with the same name as the current database. The command takes the following form:
{ clone: "db1.example.net:27017" }
Replace db1.example.net:27017 above with the resolvable hostname for the MongoDB instance you wish to copy from. Note the following behaviors:
See copydb for similar functionality.
Warning
This command obtains an intermittent write-lock on the destination server, that can block other operations until it completes.
The cloneCollection command copies a collection from a remote server to the server where you run the command.
| Parameters: |
|
|---|
Consider the following example:
{ cloneCollection: "users", from: "db.example.net:27017", query: { active: true }, copyIndexes: false }
This operation copies the “users” collection from the current database on the server at db.example.net. The operation only copies documents that satisfy the query { active: true } and does not copy indexes. cloneCollection copies indexes by default, but you can disable this behavior by setting { copyIndexes: false }. The query and copyIndexes arguments are optional.
cloneCollection creates a collection on the current database with the same name as the origin collection. If, in the above example, the users collection already exists, then MongoDB appends documents in the remote collection to the destination collection.
The cloneCollectionAsCapped command creates a new capped collection from an existing, non-capped collection within the same database. The operation does not affect the original non-capped collection.
The command has the following syntax:
{ cloneCollectionAsCapped: <existing collection>, toCollection: <capped collection>, size: <capped size> }
The command copies an existing collection and creates a new capped collection with a maximum size specified by the capped size in bytes. The name of the new capped collection must be distinct and cannot be the same as that of the original existing collection. To replace the original non-capped collection with a capped collection, use the convertToCapped command.
During the cloning, the cloneCollectionAsCapped command exhibit the following behavior:
closeAllDatabases is an internal command that invalidates all cursors and closes the open database files. The next operation that uses the database will reopen the file.
Warning
This command obtains a global write lock and will block other operations until it has completed.
New in version 2.2.
collMod makes it possible to add flags to a collection to modify the behavior of MongoDB. In the current release the only available flag is usePowerOf2Sizes. The command takes the following prototype form:
db.runCommand( {"collMod" : <collection> , "<flag>" : <value> } )
In this command substitute <collection> with the name of a collection in the current database, and <flag> and <value> with the flag and value you want to set.
The usePowerOf2Sizes flag changes the method that MongoDB uses to allocate space on disk for documents in this collection. By setting usePowerOf2Sizes, you ensure that MongoDB will allocate space for documents in sizes that are powers of 2 (e.g. 4, 8, 16, 32, 64, 128, 256, 512...8388608). With this option MongoDB will be able to more effectively reuse space.
usePowerOf2Sizes is useful for collections where you will be inserting and deleting large numbers of documents to ensure that MongoDB will effectively use space on disk.
Example
To enable usePowerOf2Sizes on the collection named sensor_readings, use the following operation:
db.runCommand({collMod: "sensor_readings", usePowerOf2Sizes:true })
To disable usePowerOf2Sizes on the collection products, use the following operation:
db.runCommand( { collMod: "products", "usePowerOf2Sizes": false })
Warning
Changed in version 2.2.1: usePowerOf2Sizes now supports documents larger than 8 megabytes. If you enable usePowerOf2Sizes you must use at least version 2.2.1.
usePowerOf2Sizes only affects subsequent allocations cased by document insertion or record relocation as a result of document growth, and does not affect existing allocations.
The collStats command returns a variety of storage statistics for a given collection. Use the following syntax:
{ collStats: "database.collection" , scale : 1024 }
Specify a namespace database.collection and use the scale argument to scale the output. The above example will display values in kilobytes.
Examine the following example output, which uses the db.collection.stats() helper in the mongo shell.
> db.users.stats()
{
"ns" : "app.users", // namespace
"count" : 9, // number of documents
"size" : 432, // collection size in bytes
"avgObjSize" : 48, // average object size in bytes
"storageSize" : 3840, // (pre)allocated space for the collection
"numExtents" : 1, // number of extents (contiguously allocated chunks of datafile space)
"nindexes" : 2, // number of indexes
"lastExtentSize" : 3840, // size of the most recently created extent
"paddingFactor" : 1, // padding can speed up updates if documents grow
"flags" : 1,
"totalIndexSize" : 16384, // total index size in bytes
"indexSizes" : { // size of specific indexes in bytes
"_id_" : 8192,
"username" : 8192
},
"ok" : 1
}
Note
The scale factor rounds values to whole numbers. This can produce unpredictable and unexpected results in some situations.
See also
New in version 2.0.
The compact command rewrites and defragments a single collection. Additionally, the command drops all indexes at the beginning of compaction and rebuilds the indexes at the end. compact is conceptually similar to repairDatabase, but works on a single collection rather than an entire database.
The command has the following syntax:
{ compact: <collection name> }
You may also specify the following options:
| Parameters: |
|
|---|
Warning
Always have an up-to-date backup before performing server maintenance such as the compact operation.
Note the following behaviors:
compact blocks all other activity. In MongoDB 2.2, compact blocks activities only for its database. You may view the intermediate progress either by viewing the mongod log file, or by running the db.currentOp() in another shell instance.
compact removes any padding factor in the collection when issued without either the paddingFactor option or the paddingBytes option. This may impact performance if the documents grow regularly. However, compact retains existing paddingFactor statistics for the collection that MongoDB will use to calculate the padding factor for future inserts.
compact generally uses less disk space than repairDatabase and is faster. However,the compact command is still slow and does block other database use. Only use compact during scheduled maintenance periods.
If you terminate the operation with the db.killOp() method or restart the server before it has finished:
compact may increase the total size and number of our data files, especially when run for the first time. However, this will not increase the total collection storage space since storage size is the amount of data allocated within the database files, and not the size/number of the files on the file system.
compact requires a small amount of additional disk space while running but unlike repairDatabase it does not free space on the file system.
You may also wish to run the collStats command before and after compaction to see how the storage space changes for the collection.
compact commands do not replicate to secondaries in a replica set:
Compact each member separately.
Ideally, compaction runs on a secondary. See option force:true above for information regarding compacting the primary.
If you run compact on a secondary, the secondary will enter a “recovering” state to prevent clients from sending read operations during compaction. Once the compaction finishes the secondary will automatically return to secondary state.
You may refer to the “partial script for automating step down and compaction”) for an example.
compact is a command issued to a mongod. In a sharded environment, run compact on each shard separately as a maintenance operation.
It is not possible to compact capped collections because they don’t have padding, and documents cannot grow in these collections. However, the documents of a capped collections are not subject to fragmentation.
See also
Note
connPoolStats only returns meaningful results for mongos instances and for mongod instances in sharded clusters.
The command connPoolStats returns information regarding the number of open connections to the current database instance, including client connections and server-to-server connections for replication and clustering. The command takes the following form:
{ connPoolStats: 1 }
The value of the argument (i.e. 1 ) does not affect the output of the command. See Connection Pool Statistics Reference for full documentation of the connPoolStats output.
connPoolSync is an internal command.
The convertToCapped command converts an existing, non-capped collection to a capped collection within the same database.
The command has the following syntax:
{convertToCapped: <collection>, size: <capped size> }
convertToCapped takes an existing collection (<collection>) and transforms it into a capped collection with a maximum size in bytes, specified to the size argument (<capped size>).
During the conversion process, the convertToCapped command exhibit the following behavior:
Note
MongoDB does not support the convertToCapped command in a sharded cluster.
Warning
The convertToCapped will not recreate indexes from the original collection on the new collection. If you need indexes on this collection you will need to create these indexes after the conversion is complete.
See also
Warning
This command obtains a global write lock and will block other operations until it has completed.
The copydb command copies a database from a remote host to the current host. The command has the following syntax:
{ copydb: 1:
fromhost: <hostname>,
fromdb: <db>,
todb: <db>,
slaveOk: <bool>,
username: <username>,
password: <password>,
nonce: <nonce>,
key: <key> }
All of the following arguments are optional:
You can omit the fromhost argument, to copy one database to another database within a single MongoDB instance.
You must run this command on the destination, or the todb server.
Be aware of the following behaviors:
copydb can run against a slave or a non-primary member of a replica set. In this case, you must set the slaveOk option to true.
copydb does not snapshot the database. If the state of the database changes at any point during the operation, the resulting database may be inconsistent.
You must run copydb on the destination server.
The destination server is not locked for the duration of the copydb operation. This means that copydb will occasionally yield to allow other operations to complete.
If the remote server has authentication enabled, then you must include a username and password. You must also include a nonce and a key. The nonce is a one-time password that you request from the remote server using the copydbgetnonce command. The key is a hash generated as follows:
hex_md5(nonce + username + hex_md5(username + ":mongo:" + pass))
If you need to copy a database and authenticate, it’s easiest to use the shell helper:
db.copyDatabase(<remote_db_name>, <local_db_name>, <from_host_name>, <username>, <password>)
Client libraries use copydbgetnonce to get a one-time password for use with the copydb command.
Note
This command obtains a write lock on the affected database and will block other operations until it has completed; however, the write lock for this operation is short lived.
The count command counts the number of documents in a collection. The command returns a document that contains the count as well as the command status. The count command takes the following prototype form:
{ count: <collection>, query: <query>, limit: <limit>, skip: <skip> }
The command fields are as follows:
| Fields: |
|
|---|
Consider the following examples of the count command:
Count the number of all documents in the orders collection:
db.runCommand( { count: 'orders' } )
In the result, the n, which represents the count, is 26 and the command status ok is 1:
{ "n" : 26, "ok" : 1 }
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012'):
db.runCommand( { count:'orders',
query: { ord_dt: { $gt: new Date('01/01/2012') } }
} )
In the result, the n, which represents the count, is 13 and the command status ok is 1:
{ "n" : 13, "ok" : 1 }
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012') skipping the first 10 matching records:
db.runCommand( { count:'orders',
query: { ord_dt: { $gt: new Date('01/01/2012') } },
skip: 10 } )
In the result, the n, which represents the count, is 3 and the command status ok is 1:
{ "n" : 3, "ok" : 1 }
Note
MongoDB also provides the cursor.count() method and the shell wrapper db.collection.count() method.
The create command explicitly creates a collection. The command uses the following syntax:
{ create: <collection_name> }
To create a capped collection limited to 40 KB, issue command in the following form:
{ create: "collection", capped: true, size: 40 * 1024 }
The options for creating capped collections are:
| Options: |
|
|---|
The db.createCollection() provides a wrapper function that provides access to this functionality.
Note
This command obtains a write lock on the affected database and will block other operations until it has completed. The write lock for this operation is typically short lived; however, allocations for large capped collections may take longer.
The cursorInfo command returns information about current cursor allotment and use. Use the following form:
{ cursorInfo: 1 }
The value (e.g. 1 above,) does not effect the output of the command.
cursorInfo returns the total number of open cursors (totalOpen,) the size of client cursors in current use (clientCursors_size,) and the number of timed out cursors since the last server restart (timedOut.)
For internal use.
The dataSize command returns the size data size for a set of data within a certain range:
{ dataSize: "database.collection", keyPattern: { field: 1 }, min: { field: 10 }, max: { field: 100 } }
This will return a document that contains the size of all matching documents. Replace database.collection value with database and collection from your deployment. The keyPattern, min, and max parameters are options.
The amount of time required to return dataSize depends on the amount of data in the collection.
The dbStats command returns storage statistics for a given database. The command takes the following syntax:
{ dbStats: 1, scale: 1 }
The value of the argument (e.g. 1 above) to dbStats does not affect the output of the command. The scale option allows you to specify how to scale byte values. For example, a scale value of 1024 will display the results in kilobytes rather than in bytes.
The time required to run the command depends on the total size of the database. Because the command has to touch all data files, the command may take several seconds to run.
In the mongo shell, the db.stats() function provides a wrapper around this functionality. See the “Database Statistics Reference” document for an overview of this output.
diagLogging is an internal command.
Warning
This command obtains a write lock on the affected database and will block other operations until it has completed.
The distinct command finds the distinct values for a specified field across a single collection. The command returns a document that contains an array of the distinct values as well as the query plan and status. The command takes the following prototype form:
{ distinct: collection, key: <field>, query: <query> }
The command fields are as follows:
| Fields: |
|
|---|
Consider the following examples of the distinct command:
Return an array of the distinct values of the field ord_dt from all documents in the orders collection:
db.runCommand ( { distinct: 'orders', key: 'ord_dt' } )
Return an array of the distinct values of the field sku in the subdocument item from all documents in the orders collection:
db.runCommand ( { distinct: 'orders', key: 'item.sku' } )
Return an array of the distinct values of the field ord_dt from the documents in the orders collection where the price is greater than 10:
db.runCommand ( { distinct: 'orders',
key: 'ord_dt',
query: { price: { $gt: 10 } }
} )
Note
driverOIDTest is an internal command.
The drop command removes an entire collection from a database. The command has following syntax:
{ drop: <collection_name> }
The mongo shell provides the equivalent helper method:
db.collection.drop();
Note that this command also removes any indexes associated with the dropped collection.
Warning
This command obtains a write lock on the affected database and will block other operations until it has completed.
The dropDatabase command drops a database, deleting the associated data files. dropDatabase operates on the current database.
In the shell issue the use <database> command, replacing <database> with the name of the database you wish to delete. Then use the following command form:
{ dropDatabase: 1 }
The mongo shell also provides the following equivalent helper method:
db.dropDatabase();
Warning
This command obtains a global write lock and will block other operations until it has completed.
The dropIndexes command drops one or all indexes from the current collection. To drop all indexes, issue the command like so:
{ dropIndexes: "collection", index: "*" }
To drop a single, issue the command by specifying the name of the index you want to drop. For example, to drop the index named age_1, use the following command:
{ dropIndexes: "collection", index: "age_1" }
The shell provides a useful command helper. Here’s the equivalent command:
db.collection.dropIndex("age_1");
Warning
This command obtains a write lock on the affected database and will block other operations until it has completed.
The emptycapped command removes all documents from a capped collection. Use the following syntax:
{ emptycapped: "events" }
This command removes all records from the capped collection named events.
Warning
This command obtains a write lock on the affected database and will block other operations until it has completed.
The enableSharding command enables sharding on a per-database level. Use the following command form:
{ enableSharding: 1 }
Once you’ve enabled sharding in a database, you can use the shardCollection command to begin the process of distributing data among the shards.
The eval command evaluates JavaScript functions on the database server and has the following form:
{
eval: <function>,
args: [ <arg1>, <arg2> ... ],
nolock: <boolean>
}
The command contains the following fields:
| Parameters: |
|
|---|---|
| Fields: |
|
Consider the following example which uses eval to perform an increment and calculate the average on the server:
db.runCommand( {
eval: function(name, incAmount) {
var doc = db.myCollection.findOne( { name : name } );
doc = doc || { name : name , num : 0 , total : 0 , avg : 0 };
doc.num++;
doc.total += incAmount;
doc.avg = doc.total / doc.num;
db.myCollection.save( doc );
return doc;
},
args: [ "eliot", 5 ]
}
);
The db in the function refers to the current database.
The shell also provides a helper method db.eval(), so you can express the above as follows:
db.eval( function(name, incAmount) {
var doc = db.myCollection.findOne( { name : name } );
doc = doc || { name : name , num : 0 , total : 0 , avg : 0 };
doc.num++;
doc.total += incAmount;
doc.avg = doc.total / doc.num;
db.myCollection.save( doc );
return doc;
},
"eliot", 5 );
You cannot pass the nolock flag to the db.eval() in the mongo shell.
If you want to use the server’s interpreter, you must run eval. Otherwise, the mongo shell’s JavaScript interpreter evaluates functions entered directly into the shell.
If an error occurs, eval throws an exception. Consider the following invalid function that uses the variable x without declaring it as an argument:
db.runCommand(
{
eval: function() { return x + x; },
args: [3]
}
)
The statement will result in the following exception:
{
"errno" : -3,
"errmsg" : "invoke failed: JS Error: ReferenceError: x is not defined nofile_b:1",
"ok" : 0
}
Warning
See also
The filemd5 command returns the md5 hashes for a single files stored using the GridFS specification. Client libraries use this command to verify that files are correctly written to MongoDB. The command takes the files_id of the file in question and the name of the GridFS root collection as arguments. For example:
{ filemd5: ObjectId("4f1f10e37671b50e4ecd2776"), root: "fs" }
The findAndModify command atomically modifies and returns a single document. By default, the returned document does not include the modifications made on the update. To return the document with the modifications made on the update, use the new option.
The command has the following syntax:
{ findAndModify: <collection>, <options> }
The findAndModify command takes the following are sub-document options:
| Fields: |
|
|---|
Consider the following example:
{ findAndModify: "people",
query: { name: "Tom", state: "active", rating: { $gt: 10 } },
sort: { rating: 1 },
update: { $inc: { score: 1 } }
}
This command performs the following actions:
The shell and many drivers provide a findAndModify() helper method. Using the shell helper, this same operation can take the following form:
db.people.findAndModify( {
query: { name: "Tom", state: "active", rating: { $gt: 10 } },
sort: { rating: 1 },
update: { $inc: { score: 1 } }
} );
Warning
When using findAndModify in a sharded environment, the query must contain the shard key for all operations against the shard cluster. findAndModify operations issued against mongos instances for non-sharded collections function normally.
Note
This command obtains a write lock on the affected database and will block other operations until it has completed; however, typically the write lock is short lived and equivalent to other similar update() operations.
flushRouterConfig clears the current cluster information cached by a mongos instance and reloads all sharded cluster metadata from the config database.
This forces an update when the configuration database holds data that is newer that the data cached in the mongos process.
Warning
Do not modify the config data, except as explicitly documented. A config database cannot typically tolerate manual manipulation.
flushRouterConfig is an administrative command that is only available for mongos instances.
New in version 1.8.2.
The forceerror command is for testing purposes only. Use forceerror to force a user assertion exception. This command always returns an ok value of 0.
The fsync command forces the mongod process to flush all pending writes to the storage layer. mongod is always writing data to the storage layer as applications write more data to the database. MongoDB guarantees that it will write all data to disk within the syncdelay interval, which is 60 seconds by default.
{ fsync: 1 }
The fsync operation is synchronous by default, to run fsync asynchronously, use the following form:
{ fsync: 1, async: true }
The connection will return immediately. You can check the output of db.currentOp() for the status of the fsync operation.
The primary use of fsync is to lock the database during backup operations. This will flush all data to the data storage layer and block all write operations until you unlock the database. Consider the following command form:
{ fsync: 1, lock: true }
Note
You may continue to perform read operations on a database that has a fsync lock. However, following the first write operation all subsequent read operations wait until you unlock the database.
To check on the current state of the fsync lock, use db.currentOp(). Use the following JavaScript function in the shell to test if the database is currently locked:
serverIsLocked = function () {
var co = db.currentOp();
if (co && co.fsyncLock) {
return true;
}
return false;
}
After loading this function into your mongo shell session you can call it as follows:
serverIsLocked()
This function will return true if the database is currently locked and false if the database is not locked. To unlock the database, make a request for an unlock using the following command:
db.getSiblingDB("admin").$cmd.sys.unlock.findOne();
New in version 1.9.0: The db.fsyncLock() and db.fsyncUnlock() helpers in the shell.
In the mongo shell, you may use the db.fsyncLock() and db.fsyncUnlock() wrappers for the fsync lock and unlock process:
db.fsyncLock();
db.fsyncUnlock();
Note
fsync lock is only possible on individual shards of a sharded cluster, not on the entire sharded cluster. To backup an entire sharded cluster, please read considerations for backing up sharded clusters.
If your mongod has journaling enabled, consider using another method to back up your database.
Note
The database cannot be locked with db.fsyncLock() while profiling is enabled. You must disable profiling before locking the database with db.fsyncLock(). Disable profiling using db.setProfilingLevel() as follows in the mongo shell:
db.setProfilingLevel(0)
The geoNear command provides an alternative to the $near operator. In addition to the functionality of $near, geoNear returns the distance of each item from the specified point along with additional diagnostic information. For example:
{ geoNear : "places" , near : [50,50], num : 10 }
Here, geoNear returns the 10 items nearest to the coordinates [50,50] in the collection named places. geoNear provides the following options (specify all distances in the same units as the document coordinate system:)
| Fields: |
|
|---|
The geoSearch command provides an interface to MongoDB’s haystack index functionality. These indexes are useful for returning results based on location coordinates after collecting results based on some other query (i.e. a “haystack.”) Consider the following example:
{ geoSearch : "places", near : [33, 33], maxDistance : 6, search : { type : "restaurant" }, limit : 30 }
The above command returns all documents with a type of restaurant having a maximum distance of 6 units from the coordinates [30,33] in the collection places up to a maximum of 30 results.
Unless specified otherwise, the geoSearch command limits results to 50 documents.
The getCmdLineOpts command returns a document containing command line options used to start the given mongod:
{ getCmdLineOpts: 1 }
This command returns a document with two fields, argv and parsed. The argv field contains an array with each item from the command string used to invoke mongod. The document in the parsed field includes all runtime options, including those parsed from the command line and those specified in the configuration file, if specified.
Consider the following example output of getCmdLineOpts:
{
"argv" : [
"/usr/bin/mongod",
"--config",
"/etc/mongodb.conf",
"--fork"
],
"parsed" : {
"bind_ip" : "127.0.0.1",
"config" : "/etc/mongodb/mongodb.conf",
"dbpath" : "/srv/mongodb",
"fork" : true,
"logappend" : "true",
"logpath" : "/var/log/mongodb/mongod.log",
"quiet" : "true"
},
"ok" : 1
}
http://docs.mongodb.org/manual/administration/import-export/
The getLastError command returns the error status of the last operation on the current connection. By default MongoDB does not provide a response to confirm the success or failure of a write operation, clients typically use getLastError in combination with write operations to ensure that the write succeeds.
Consider the following prototype form.
{ getLastError: 1 }
The following options are available:
| Parameters: |
|
|---|
See also
Write Concern, Replica Set Write Concern, and db.getLastError().
The getLog command returns a document with a log array that contains recent messages from the mongod process log. The getLog command has the following syntax:
{ getLog: <log> }
Replace <log> with one of the following values:
You may also specify an asterisk (e.g. *) as the <log> value to return a list of available log filters. The following interaction from the mongo shell connected to a replica set:
db.adminCommand({getLog: "*" })
{ "names" : [ "global", "rs", "startupWarnings" ], "ok" : 1 }
getLog returns events from a RAM cache of the mongod events and does not read log data from the log :file.
getParameter is an administrative command for retrieving the value of options normally set on the command line. Issue commands against the admin database as follows:
{ getParameter: 1, <option>: 1 }
The values specified for getParameter and <option> do not affect the output. The command works with the following options:
See also
setParameter for more about these parameters.
The getPrevError command returns the errors since the last resetError command.
See also
getShardMap is an internal command that supports the sharding functionality.
getShardVersion is an internal command that supports sharding functionality.
The group command groups documents in a collection by the specified key and performs simple aggregation functions such as computing counts and sums. The command is analogous to a SELECT ... GROUP BY statement in SQL. The command returns a document with the grouped records as well as the command meta-data.
The group command takes the following prototype form:
{ group: { ns: <namespace>,
key: <key>,
$reduce: <reduce function>,
$keyf: <key function>,
cond: <query>,
finalize: <finalize function> } }
The command fields are as follows:
| Fields: |
|
|---|
Warning
Note
The result set must fit within the maximum BSON document size.
Additionally, in version 2.2, the returned array can contain at most 20,000 elements; i.e. at most 20,000 unique groupings. For group by operations that results in more than 20,000 unique groupings, use mapReduce. Previous versions had a limit of 10,000 elements.
For the shell, MongoDB provides a wrapper method db.collection.group(); however, the db.collection.group() method takes the keyf field and the reduce field whereas the group command takes the $keyf field and the $reduce field.
Consider the following examples of the db.collection.group() method:
The examples assume an orders collection with documents of the following prototype:
{
_id: ObjectId("5085a95c8fada716c89d0021"),
ord_dt: ISODate("2012-07-01T04:00:00Z"),
ship_dt: ISODate("2012-07-02T04:00:00Z"),
item: { sku: "abc123",
price: 1.99,
uom: "pcs",
qty: 25 }
}
The following example groups by the ord_dt and item.sku fields those documents that have ord_dt greater than 01/01/2012:
db.runCommand( { group:
{
ns: 'orders',
key: { ord_dt: 1, 'item.sku': 1 },
cond: { ord_dt: { $gt: new Date( '01/01/2012' ) } },
$reduce: function ( curr, result ) { },
initial: { }
}
} )
The result is a documents that contain the retval field which contains the group by records, the count field which contains the total number of documents grouped, the keys field which contains the number of unique groupings (i.e. number of elements in the retval), and the ok field which contains the command status:
{ "retval" :
[ { "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc456"},
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "bcd123"},
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "efg456"},
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "efg456"},
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "ijk123"},
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc456"},
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc456"}
],
"count" : 13,
"keys" : 11,
"ok" : 1 }
The method call is analogous to the SQL statement:
SELECT ord_dt, item_sku
FROM orders
WHERE ord_dt > '01/01/2012'
GROUP BY ord_dt, item_sku
The following example groups by the ord_dt and item.sku fields, those documents that have ord_dt greater than 01/01/2012 and calculates the sum of the qty field for each grouping:
db.runCommand( { group:
{
ns: 'orders',
key: { ord_dt: 1, 'item.sku': 1 },
cond: { ord_dt: { $gt: new Date( '01/01/2012' ) } },
$reduce: function ( curr, result ) {
result.total += curr.item.qty;
},
initial: { total : 0 }
}
} )
The retval field of the returned document is an array of documents that contain the group by fields and the calculated aggregation field:
{ "retval" :
[ { "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc123", "total" : 25 },
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc456", "total" : 25 },
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "bcd123", "total" : 10 },
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "efg456", "total" : 10 },
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "abc123", "total" : 25 },
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "efg456", "total" : 15 },
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "ijk123", "total" : 20 },
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc123", "total" : 45 },
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc456", "total" : 25 },
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc123", "total" : 25 },
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc456", "total" : 25 }
],
"count" : 13,
"keys" : 11,
"ok" : 1 }
The method call is analogous to the SQL statement:
SELECT ord_dt, item_sku, SUM(item_qty) as total
FROM orders
WHERE ord_dt > '01/01/2012'
GROUP BY ord_dt, item_sku
The following example groups by the calculated day_of_week field, those documents that have ord_dt greater than 01/01/2012 and calculates the sum, count, and average of the qty field for each grouping:
db.runCommand( { group:
{
ns: 'orders',
$keyf: function(doc) {
return { day_of_week: doc.ord_dt.getDay() } ; },
cond: { ord_dt: { $gt: new Date( '01/01/2012' ) } },
$reduce: function ( curr, result ) {
result.total += curr.item.qty;
result.count++;
},
initial: { total : 0, count: 0 },
finalize: function(result) {
var weekdays = [ "Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday",
"Friday", "Saturday" ];
result.day_of_week = weekdays[result.day_of_week];
result.avg = Math.round(result.total / result.count);
}
}
} )
The retval field of the returned document is an array of documents that contain the group by fields and the calculated aggregation field:
{ "retval" :
[ { "day_of_week" : "Sunday", "total" : 70, "count" : 4, "avg" : 18 },
{ "day_of_week" : "Friday", "total" : 110, "count" : 6, "avg" : 18 },
{ "day_of_week" : "Tuesday", "total" : 70, "count" : 3, "avg" : 23 }
],
"count" : 13,
"keys" : 3,
"ok" : 1 }
See also
The isMaster command provides a basic overview of the current replication configuration. MongoDB drivers and clients use this command to determine what kind of member they’re connected to and to discover additional members of a replica set. The db.isMaster() method provides a wrapper around this database command.
The command takes the following form:
{ isMaster: 1 }
This command returns a document containing the following fields:
The name of the current replica set, if applicable.
A boolean value that reports when this node is writable. If true, then the current node is either a primary node in a replica set, a master node in a master-slave configuration, of a standalone mongod.
A boolean value that, when true, indicates that the current node is a secondary member of a replica set.
An array of strings in the format of “[hostname]:[port]” listing all nodes in the replica set that are not “hidden”.
The [hostname]:[port] for the current replica set primary, if applicable.
The [hostname]:[port] of the node responding to this command.
This command verifies that a process is a mongos.
If you issue the isdbgrid command when connected to a mongos, the response document includes the isdbgrid field set to 1. The returned document is similar to the following:
{ "isdbgrid" : 1, "hostname" : "app.example.net", "ok" : 1 }
If you issue the isdbgrid command when connected to a mongod, MongoDB returns an error document. The isdbgrid command is not available to mongod. The error document, however, also includes a line that reads "isdbgrid" : 1, just as in the document returned for a mongos. The error document is similar to the following:
{
"errmsg" : "no such cmd: isdbgrid",
"bad cmd" : {
"isdbgrid" : 1
},
"ok" : 0
}
You can instead use the isMaster command to determine connection to a mongos. When connected to a mongos, the isMaster command returns a document that contains the string isdbgrid in the msg field.
journalLatencyTest is an administrative command that tests the length of time required to write and perform a file system sync (e.g. fsync) for a file in the journal directory. You must issue the journalLatencyTest command against the admin database in the form:
{ journalLatencyTest: 1 }
The value (i.e. 1 above), does not affect the operation of the command.
The listCommands command generates a list of all database commands implemented for the current mongod instance.
The listDatabases command provides a list of existing databases along with basic statistics about them:
{ listDatabases: 1 }
The value (e.g. 1) does not effect the output of the command. listDatabases returns a document for each database Each document contains a name field with the database name, a sizeOnDisk field with the total size of the database file on disk in bytes, and an empty field specifying whether the database has any data.
Use the listShards command to return a list of configured shards. The command takes the following form:
{ listShards: 1 }
The logRotate command is an administrative command that allows you to rotate the MongoDB logs to prevent a single logfile from consuming too much disk space. You must issue the logRotate command against the admin database in the form:
{ logRotate: 1 }
Note
Your mongod instance needs to be running with the --logpath [file] option.
You may also rotate the logs by sending a SIGUSR1 signal to the mongod process. If your mongod has a process ID of 2200, here’s how to send the signal on Linux:
kill -SIGUSR1 2200
logRotate renames the existing log file by appending the current timestamp to the filename. The appended timestamp has the following form:
<YYYY>-<mm>-<DD>T<HH>-<MM>-<SS>
Then logRotate creates a new log file with the same name as originally specified by the logpath setting to mongod or mongos.
Note
New in version 2.0.3: The logRotate command is available to mongod instances running on Windows systems with MongoDB release 2.0.3 and higher.
The logout command terminates the current authenticated session:
{ logout: 1 }
Note
If you’re not logged in and using authentication, this command will have no effect.
The mapReduce command allows you to run map-reduce aggregation operations over a collection. The mapReduce command has the following prototype form:
db.runCommand(
{
mapReduce: <collection>,
map: <function>,
reduce: <function>,
out: <collection>,
query: <document>,
sort: <document>,
limit: <number>,
finalize: <function>,
scope: <document>,
jsMode: <boolean>,
verbose: <boolean>
}
)
Pass the name of the collection to the mapReduce command (i.e. <collection>) to use as the source documents to perform the map reduce operation. The command also accepts the following parameters:
| Parameters: |
|
|---|
Consider the following prototype map:dbcommand:mapReduce operation:
var mapFunction = function() { ... };
var reduceFunction = function(key, values) { ... };
db.runCommand(
{
mapReduce: 'orders',
map: mapFunction,
reduce: reduceFunction,
out: { merge: 'map_reduce_results' },
query: { ord_date: { $gt: new Date('01/01/2012') } }
}
)
In the mongo, the db.collection.mapReduce() method is a wrapper around the mapReduce command. The following examples use the db.collection.mapReduce():
Consider the following map-reduce operations on a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 250,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Perform map-reduce operation on the orders collection to group by the cust_id, and for each cust_id, calculate the sum of the price for each cust_id:
Define the map function to process each input document:
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
Define the corresponding reduce function with two arguments keyCustId and valuesPrices:
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
This operation outputs the results to a collection named map_reduce_example. If the map_reduce_example collection already exists, the operation will replace the contents with the results of this map-reduce operation:
In this example you will perform a map-reduce operation on the orders collection, for all documents that have an ord_date value greater than 01/01/2012. The operation groups by the item.sku field, and for each sku calculates the number of orders and the total quantity ordered. The operation concludes by calculating the average quantity per order for each sku value:
Define the map function to process each input document:
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = {
count: 1,
qty: this.items[idx].qty
};
emit(key, value);
}
};
Define the corresponding reduce function with two arguments keySKU and valuesCountObjects:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
Define a finalize function with two arguments key and reducedValue. The function modifies the reducedValue object to add a computed field named average and returns the modified object:
var finalizeFunction2 = function (key, reducedValue) {
reducedValue.average = reducedValue.qty/reducedValue.count;
return reducedValue;
};
Perform the map-reduce operation on the orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date: { $gt: new Date('01/01/2012') } },
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents with ord_date greater than new Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the map_reduce_example collection already exists, the operation will merge the existing contents with the results of this map-reduce operation:
For more information and examples, see the Map-Reduce page.
See also
Provides internal functionality to support map-reduce in sharded environments.
See also
_migrateClone is an internal command. Do not call directly.
moveChunk is an internal administrative command that moves chunks between shards. You must issue the moveChunk command against the admin database in the form:
db.runCommand( { moveChunk : <namespace> ,
find : <query> ,
to : <destination>,
<options> } )
| Parameters: |
|
|---|
If you set _secondaryThrottle to true, during chunk migrations when a shard hosted by a replica set, the mongod will wait until the secondary members replicate the migration operations continuing to migrate chunk data. You may also configure _secondaryThrottle in the balancer configuration.
Use the sh.moveChunk() helper in the mongo shell to migrate chunks manually.
The chunk migration section describes how chunks move between shards on MongoDB.
moveChunk will return the following if another cursor is using the chunk you are moving:
errmsg: "The collection's metadata lock is already taken."
These errors usually occur when there are too many open cursors accessing the chunk you are migrating. You can either wait until the cursors complete their operation or close the cursors manually.
Note
Only use the moveChunk in special circumstances such as preparing your sharded cluster for an initial ingestion of data, or a large bulk import operation. See Create Chunks (Pre-Splitting) for more information.
In a sharded cluster, this command reassigns the database’s primary shard, which holds all un-sharded collections in the database. movePrimary is an administrative command that is only available for mongos instances. Only use movePrimary when removing a shard from a sharded cluster.
Important
Only use movePrimary when:
See Remove Shards from an Existing Sharded Cluster for a complete procedure.
movePrimary changes the primary shard for this database in the cluster metadata, and migrates all un-sharded collections to the specified shard. Use the command with the following form:
{ movePrimary : "test", to : "shard0001" }
When the command returns, the database’s primary location will shift to the designated shard. To fully decommission a shard, use the removeShard command.
The ping command is a no-op used to test whether a server is responding to commands. This command will return immediately even if the server is write-locked:
{ ping: 1 }
The value (e.g. 1 above,) does not impact the behavior of the command.
Returns data regarding the status of a sharded cluster and includes information regarding the distribution of chunks. printShardingStatus is only available when connected to a sharded cluster via a mongos. Typically, you will use the sh.status() mongo shell wrapper to access this data.
Use the profile command to enable, disable, or change the query profiling level. This allows administrators to capture data regarding performance. The database profiling system can impact performance and can allow the server to write the contents of queries to the log, which might information security implications for your deployment. Consider the following prototype syntax:
{ profile: <level> }
The following profiling levels are available:
| Level | Setting |
| 0 | Off. No profiling. |
| 1 | On. Only includes slow operations. |
| 2 | On. Includes all operations. |
You may optionally set a threshold in milliseconds for profiling using the slowms option, as follows:
{ profile: 1, slowms: 200 }
mongod writes the output of the database profiler to the system.profile collection.
mongod records a record of queries that take longer than the slowms to the log even when the database profiler is not active.
See also
Additional documentation regarding database profiling Database Profiling.
See also
“db.getProfilingStatus()” and “db.setProfilingLevel()” provide wrappers around this functionality in the mongo shell.
Note
The database cannot be locked with db.fsyncLock() while profiling is enabled. You must disable profiling before locking the database with db.fsyncLock(). Disable profiling using db.setProfilingLevel() as follows in the mongo shell:
db.setProfilingLevel(0)
Note
This command obtains a write lock on the affected database and will block other operations until it has completed; however the write lock is only in place while the enabling and disabling the profiler, which is typically a short operation.
The reIndex command rebuilds all indexes for a specified collection. Use the following syntax:
{ reIndex: "collection" }
Normally, MongoDB compacts indexes during routine updates. For most users, the reIndex command is unnecessary. However, it may be worth running if the collection size has changed significantly or if the indexes are consuming a disproportionate amount of disk space.
Note that the reIndex command will block the server against writes and may take a long time for large collections.
Call reIndex using the following form:
db.collection.reIndex();
Warning
This command obtains a write lock on the affected database and will block other operations until it has completed.
_recvChunkAbort is an internal command. Do not call directly.
_recvChunkCommit is an internal command. Do not call directly.
_recvChunkStart is an internal command. Do not call directly.
Warning
This command obtains a write lock on the affected database and will block other operations until it has completed.
_recvChunkStatus is an internal command. Do not call directly.
Starts the process of removing a shard from a cluster. This is a multi-stage process. Begin by issuing the following command:
{ removeShard : "[shardName]" }
The balancer will then migrating chunks from the shard specified by [shardName]. This process happens slowly to avoid placing undue load on the overall cluster.
The command returns immediately, with the following message:
{ msg : "draining started successfully" , state: "started" , shard: "shardName" , ok : 1 }
If you run the command again, you’ll see the following progress output:
{ msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 23 , dbs: 1 }, ok: 1 }
The remaining document specifies how many chunks and databases remain on the shard. Use printShardingStatus to list the databases that you must move from the shard.
Each database in a sharded cluster has a primary shard. If the shard you want to remove is also the primary of one the cluster’s databases, then you must manually move the database to a new shard. This can be only after the shard is empty. See the movePrimary command for details.
After removing all chunks and databases from the shard, you may issue the command again, to return:
{ msg: "remove shard completed successfully , stage: "completed", host: "shardName", ok : 1 }
The renameCollection command is an administrative command that changes the name of an existing collection. You specify collections to renameCollection in the form of a complete namespace, which includes the database name. To rename a collection, issue the renameCollection command against the admin database in the form:
{ renameCollection: <source-namespace>, to: <target-namespace>[, dropTarget: <boolean> ] }
The dropTarget argument is optional.
If you specify a collection to the to argument in a different database, the renameCollection command will copy the collection to the new database and then drop the source collection.
| Parameters: |
|
|---|---|
| Exception: |
|
You can use renameCollection in production environments; however:
Warning
renameCollection will fail if target is the name of an existing collection and you do not specify dropTarget: true.
If the renameCollection operation does not complete the target collection and indexes will not be usable and will require manual intervention to clean up.
The shell helper db.collection.renameCollection() provides a simpler interface to using this command within a database. The following is equivalent to the previous example:
db.source-namespace.renameCollection( "target" )
Warning
You cannot use renameCollection with sharded collections.
Warning
This command obtains a global write lock and will block other operations until it has completed.
Warning
In general, if you have an intact copy of your data, such as would exist on a very recent backup or an intact member of a replica set, do not use repairDatabase or related options like db.repairDatabase() in the mongo shell or mongod --repair. Restore from an intact copy of your data.
Note
When using journaling, there is almost never any need to run repairDatabase. In the event of an unclean shutdown, the server will be able restore the data files to a pristine state automatically.
The repairDatabase command checks and repairs errors and inconsistencies with the data storage. The command is analogous to a fsck command for file systems.
If your mongod instance is not running with journaling the system experiences an unexpected system restart or crash, and you have no other intact replica set members with this data, you should run the repairDatabase command to ensure that there are no errors in the data storage.
As a side effect, the repairDatabase command will compact the database, as the compact command, and also reduces the total size of the data files on disk. The repairDatabase command will also recreate all indexes in the database.
Use the following syntax:
{ repairDatabase: 1 }
Be aware that this command can take a long time to run if your database is large. In addition, it requires a quantity of free disk space equal to the size of your database. If you lack sufficient free space on the same volume, you can mount a separate volume and use that for the repair. In this case, you must run the command line and use the --repairpath switch to specify the folder in which to store the temporary repair files.
Warning
This command obtains a global write lock and will block other operations until it has completed.
This command is accessible via a number of different avenues. You may:
Use the shell to run the above command, as above.
Use the db.repairDatabase() in the mongo shell.
Run mongod directly from your system’s shell. Make sure that mongod isn’t already running, and that you issue this command as a user that has access to MongoDB’s data files. Run as:
$ mongod --repair
To add a repair path:
$ mongod --repair --repairpath /opt/vol2/data
Note
This command will fail if your database is not a master or primary. In most cases, you should recover a corrupt secondary using the data from an existing intact node. If you must repair a secondary or slave node, first restart the node as a standalone mongod by omitting the --replSet or --slave options, as necessary.
replSetElect is an internal command that support replica set functionality.
The replSetFreeze command prevents a replica set member from seeking election for the specified number of seconds. Use this command in conjunction with the replSetStepDown command to make a different node in the replica set a primary.
The replSetFreeze command uses the following syntax:
{ replSetFreeze: <seconds> }
If you want to unfreeze a replica set member before the specified number of seconds has elapsed, you can issue the command with a seconds value of 0:
{ replSetFreeze: 0 }
Restarting the mongod process also unfreezes a replica set member.
replSetFreeze is an administrative command, and you must issue the it against the admin database.
replSetFresh is an internal command that supports replica set functionality.
replSetGetRBID is an internal command that supports replica set functionality.
The replSetGetStatus command returns the status of the replica set from the point of view of the current server. You must run the command against the admin database. The command has the following prototype format:
{ replSetGetStatus: 1 }
However, you can also run this command from the shell like so:
rs.status()
See also
“Replica Set Status Reference” and “Replication Fundamentals“
replSetHeartbeat is an internal command that supports replica set functionality.
The replSetInitiate command initializes a new replica set. Use the following syntax:
{ replSetInitiate : <config_document> }
The <config_document> is a document that specifies the replica set’s configuration. For instance, here’s a config document for creating a simple 3-member replica set:
{
_id : <setname>,
members : [
{_id : 0, host : <host0>},
{_id : 1, host : <host1>},
{_id : 2, host : <host2>},
]
}
A typical way of running this command is to assign the config document to a variable and then to pass the document to the rs.initiate() helper:
config = {
_id : "my_replica_set",
members : [
{_id : 0, host : "rs1.example.net:27017"},
{_id : 1, host : "rs2.example.net:27017"},
{_id : 2, host : "rs3.example.net", arbiterOnly: true},
]
}
rs.initiate(config)
Notice that omitting the port cause the host to use the default port
of 27017. Notice also that you can specify other options in the config
documents such as the ``arbiterOnly`` setting in this example.
See also
“Replica Set Configuration,” “Replica Set Administration,” and “Replica Set Reconfiguration.”
The replSetMaintenance admin command enables or disables the maintenance mode for a secondary member of a replica set.
The command has the following prototype form:
{ replSetMaintenance: <boolean> }
Consider the following behavior when running the replSetMaintenance command:
The replSetReconfig command modifies the configuration of an existing replica set. You can use this command to add and remove members, and to alter the options set on existing members. Use the following syntax:
{ replSetReconfig: <new_config_document>, force: false }
You may also run the command using the shell’s rs.reconfig() method.
Be aware of the following replSetReconfig behaviors:
You must issue this command against the admin database of the current primary member of the replica set.
You can optionally force the replica set to accept the new configuration by specifying force: true. Use this option if the current member is not primary or if a majority of the members of the set are not accessible.
Warning
Forcing the replSetReconfig command can lead to a rollback situation. Use with caution.
Use the force option to restore a replica set to new servers with different hostnames. This works even if the set members already have a copy of the data.
A majority of the set’s members must be operational for the changes to propagate properly.
This command can cause downtime as the set renegotiates primary-status. Typically this is 10-20 seconds, but could be as long as a minute or more. Therefore, you should attempt to reconfigure only during scheduled maintenance periods.
In some cases, replSetReconfig forces the current primary to step down, initiating an election for primary among the members of the replica set. When this happens, the set will drop all current connections.
Note
replSetReconfig obtains a special mutually exclusive lock to prevent more than one :dbcommand`replSetReconfig` operation from occurring at the same time.
| Options: |
|
|---|
The replSetStepDown command forces the primary of the replica set to relinquish its status as primary. This initiates an election for primary. You may specify a number of seconds for the node to avoid election to primary:
{ replSetStepDown: <seconds> }
If you do not specify a value for <seconds>, replSetStepDown will attempt to avoid reelection to primary for 60 seconds.
Warning
This will force all clients currently connected to the database to disconnect. This help to ensure that clients maintain an accurate view of the replica set.
New in version 2.0: If there is no secondary, within 10 seconds of the primary, replSetStepDown will not succeed to prevent long running elections.
New in version 2.2.
| Options: |
|
|---|
replSetSyncFrom allows you to explicitly configure which host the current mongod will poll oplog entries from. This operation may be useful for testing different patterns and in situations where a set member is not syncing from the host you want. The member to sync from must be a valid source for data in the set; a member of a replica set cannot sync from:
If you attempt to sync from a member that is more than 10 seconds behind the current member, mongod will return and log a warning, but will sync from such members.
The command has the following prototype form:
{ replSetSyncFrom: "[hostname]:[port]" }
To run the command in the mongo shell, use the following invocation:
db.adminCommand( { replSetSyncFrom: "[hostname]:[port]" } )
You may also use the rs.syncFrom() helper in the mongo shell, in an operation with the following form:
rs.syncFrom("[hostname]:[port]")
Note
replSetSyncFrom provides a temporary override of default behavior. When you restart the mongod instance, if the connection that the mongod uses to sync, the mongod will revert to the default logic for selecting a sync source.
replSetTest is internal diagnostic command used for regression tests that supports replica set functionality.
The resetError command resets the last error status.
See also
The resync command forces an out-of-date slave mongod instance to re-synchronize itself. Note that this command is relevant to master-slave replication only. It does no apply to replica sets.
Warning
This command obtains a global write lock and will block other operations until it has completed.
The serverStatus command returns a document that provides an overview of the database process’s state. Most monitoring applications run this command at a regular interval to collection statistics about the instance:
{ serverStatus: 1 }
The value (i.e. 1 above), does not affect the operation of the command.
See also
setParameter is an administrative command for modifying options normally set on the command line. You must issue the setParameter command against the admin database in the form:
{ setParameter: 1, <option>: <value> }
Replace the <option> with one of the following options supported by this command:
| Options: |
|
|---|
setShardVersion is an internal command that supports sharding functionality.
The shardCollection command marks a collection for sharding and will allow data to begin distributing among shards. You must run enableSharding on a database before running the shardCollection command.
{ shardCollection: "<db>.<collection>", key: <shardkey> }
This enables sharding for the collection specified by <collection> in the database named <db>, using the key <shardkey> to distribute documents among the shard. <shardkey> is a document, and takes the same form as an index specification document.
Choosing the right shard key to effectively distribute load among your shards requires some planning.
See also
Sharding for more information related to sharding. Also consider the section on Shard Keys for documentation regarding shard keys.
Warning
There’s no easy way to disable sharding after running shardCollection. In addition, you cannot change shard keys once set. If you must convert a sharded cluster to a standalone node or replica set, you must make a single backup of the entire cluster and then restore the backup to the standalone mongod or the replica set..
The shardingState command returns true if the mongod instance is a member of a sharded cluster. Run the command using the following syntax:
{ shardingState: 1 }
Warning
This command obtains a write lock on the affected database and will block other operations until it has completed; however, the operation is typically short lived.
The shutdown command cleans up all database resources and then terminates the process. You must issue the shutdown command against the admin database in the form:
{ shutdown: 1 }
Note
Run the shutdown against the admin database. When using shutdown, the connection must originate from localhost or use an authenticated connection.
If the node you’re trying to shut down is a replica set primary, then the command will succeed only if there exists a secondary node whose oplog data is within 10 seconds of the primary. You can override this protection using the force option:
{ shutdown: 1, force: true }
Alternatively, the shutdown command also supports a timeoutSecs argument which allows you to specify a number of seconds to wait for other members of the replica set to catch up:
{ shutdown: 1, timeoutSecs: 60 }
The equivalent mongo shell helper syntax looks like this:
db.shutdownServer({timeoutSecs: 60});
_skewClockCommand is an internal command. Do not call directly.
sleep is an internal command for testing purposes. The sleep command forces the database to block all operations. It takes the following options:
| Parameters: |
|
|---|
{ sleep: { w: true, secs: <seconds> } }
The above command places the mongod instance in a “write-lock” state for a specified (i.e. <seconds>) number of seconds. Without arguments, sleep, causes a “read lock” for 100 seconds.
Warning
sleep claims the lock specified in the w argument and blocks all operations on the mongod instance for the specified amount of time.
The split command creates new chunks in a sharded environment. While splitting is typically managed automatically by the mongos instances, this command makes it possible for administrators to manually create splits.
In normal operation there is no need to manually split chunks
The balancer and other sharding infrastructure will automatically create chunks in the course of normal operations. See Sharding Internals for more information.
Consider the following example:
db.runCommand( { split : "test.people" , find : { _id : 99 } } )
This command inserts a new split in the collection named people in the test database. This will split the chunk that contains the document that matches the query { _id : 99 } in half. If the document specified by the query does not (yet) exist, the split will divide the chunk where that document would exist.
The split divides the chunk in half, and does not split the chunk using the identified document as the middle. To define an arbitrary split point, use the following form:
db.runCommand( { split : "test.people" , middle : { _id : 99 } } )
This form is typically used when pre-splitting data in a collection.
split is an administrative command that is only available for mongos instances.
splitChunk is an internal command. Use the sh.splitFind() and sh.splitAt() functions in the mongo shell to access this functionality.
_testDistLockWithSkew is an internal command. Do not call directly.
_testDistLockWithSyncCluster is an internal command. Do not call directly.
The top command is an administrative command which returns raw usage of each database, and provides amount of time, in microseconds, used and a count of operations for the following event types:
You must issue the top command against the admin database in the form:
{ top: 1 }
New in version 2.2.
The touch command loads data from the data storage layer into memory. touch can load the data (i.e. documents,) indexes or both documents and indexes. Use this command to ensure that a collection, and/or its indexes, are in memory before another operation. By loading the collection or indexes into memory, mongod will ideally be able to perform subsequent operations more efficiently. The touch command has the following prototypical form:
{ touch: [collection], data: [boolean], index: [boolean] }
By default, data and index are false, and touch will perform no operation. For example, to load both the data and the index for a collection named records, you would use the following command in the mongo shell:
db.runCommand({ touch: "records", data: true, index: true })
touch will not block read and write operations on a mongod, and can run on secondary members of replica sets.
Note
Using touch to control or tweak what a mongod stores in memory may displace other records data in memory and hinder performance. Use with caution in production systems.
_transferMods is an internal command. Do not call directly.
unsetSharding is an internal command that supports sharding functionality.
The validate command checks the contents of a namespace by scanning a collection’s data and indexes for correctness. The command can be slow, particularly on larger data sets:
{ validate: "users" }
This command will validate the contents of the collection named users. You may also specify one of the following options:
full: true provides a more thorough scan of the data.
without skipping the scan of the index.
The mongo shell also provides a wrapper:
db.collection.validate();
Use one of the following forms to perform the full collection validation:
db.collection.validate(true)
db.runCommand( { validate: "collection", full: true } )
Warning
This command is resource intensive and may have an impact on the performance of your MongoDB instance.
whatsmyuri is an internal command.
writebacklisten is an internal command.
writeBacksQueued is an internal command that returns a document reporting there are operations in the write back queue for the given mongos and information about the queues.
Boolean.
hasOpsQueued is true if there are write Back operations queued.
Integer.
totalOpsQueued reflects the number of operations queued.
Document.
queues holds a sub-document where the fields are all write back queues. These field hold a document with two fields that reports on the state of the queue. The fields in these documents are:
The command document has the following prototype form:
{writeBacksQueued: 1}
To call writeBacksQueued from the mongo shell, use the following db.runCommand() form:
db.runCommand({writeBacksQueued: 1})
Consider the following example output:
{
"hasOpsQueued" : true,
"totalOpsQueued" : 7,
"queues" : {
"50b4f09f6671b11ff1944089" : { "n" : 0, "minutesSinceLastCall" : 1 },
"50b4f09fc332bf1c5aeaaf59" : { "n" : 0, "minutesSinceLastCall" : 0 },
"50b4f09f6671b1d51df98cb6" : { "n" : 0, "minutesSinceLastCall" : 0 },
"50b4f0c67ccf1e5c6effb72e" : { "n" : 0, "minutesSinceLastCall" : 0 },
"50b4faf12319f193cfdec0d1" : { "n" : 0, "minutesSinceLastCall" : 4 },
"50b4f013d2c1f8d62453017e" : { "n" : 0, "minutesSinceLastCall" : 0 },
"50b4f0f12319f193cfdec0d1" : { "n" : 0, "minutesSinceLastCall" : 1 }
},
"ok" : 1
}
| Returns: | The timestamp portion of the ObjectId() object as a Date. |
|---|
In the following example, call the getTimestamp() method on an ObjectId (e.g. ObjectId("507c7f79bcf86cd7994f6c0e")), as follows:
ObjectId("507c7f79bcf86cd7994f6c0e").getTimestamp()
This will return the following output:
ISODate("2012-10-15T21:26:17Z")
| Returns: | The string representation of the ObjectId() object. This value has the format of ObjectId(...). |
|---|
Changed in version 2.2: In previous versions ObjectId.toString() returns the value of the ObjectId as a hexadecimal string.
In the following example, call the toString() method on an ObjectId (e.g. ObjectId("507c7f79bcf86cd7994f6c0e")), as follows:
ObjectId("507c7f79bcf86cd7994f6c0e").toString()
This will return the following string:
ObjectId("507c7f79bcf86cd7994f6c0e")
You can confirm the type of this object using the following operation:
typeof ObjectId("507c7f79bcf86cd7994f6c0e").toString()
| Returns: | The value of the ObjectId() object as a lowercase hexadecimal string. This value is the str attribute of the ObjectId() object. |
|---|
Changed in version 2.2: In previous versions ObjectId.valueOf() returns the ObjectId() object.
In the following example, call the valueOf() method on an ObjectId (e.g. ObjectId("507c7f79bcf86cd7994f6c0e")), as follows:
ObjectId("507c7f79bcf86cd7994f6c0e").valueOf()
This will return the following string:
507c7f79bcf86cd7994f6c0e
You can confirm the type of this object using the following operation:
typeof ObjectId("507c7f79bcf86cd7994f6c0e").valueOf()
| Parameters: |
|
|---|
Returns the contents of the specified file.
This function returns with output relative to the current shell session, and does not impact the server.
| Parameters: |
|
|---|
Changes the current working directory to the specified path.
This function returns with output relative to the current shell session, and does not impact the server.
Note
This feature is not yet implemented.
The batchSize() method specifies the number of documents to return in each batch of the response from the MongoDB instance. In most cases, modifying the batch size will not affect the user or the application since the mongo shell and most drivers return results as if MongoDB returned a single batch.
The batchSize() method takes the following parameter:
| Parameters: |
|
|---|
Note
Specifying 1 or a negative number is analogous to using the limit() method.
Consider the following example of the batchSize() method in the mongo shell:
db.inventory.find().batchSize(10)
This operation will set the batch size for the results of a query (i.e. find()) to 10. The effects of this operation do not affect the output in the mongo shell, which always iterates over the first 20 documents.
The count() method counts the number of documents referenced by a cursor. Append the count() method to a find() query to return the number of matching documents, as in the following prototype:
db.collection.find().count()
This operation does not actually perform the find(); instead, the operation counts the results that would be returned by the find().
The count() can accept the following argument:
| Parameters: |
|
|---|
MongoDB also provides the shell wrapper db.collection.count() for the db.collection.find().count() construct.
Consider the following examples of the count() method:
Count the number of all documents in the orders collection:
db.orders.find().count()
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012'):
db.orders.find( { ord_dt: { $gt: new Date('01/01/2012') } } ).count()
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012') taking into account the effect of the limit(5):
db.orders.find( { ord_dt: { $gt: new Date('01/01/2012') } } ).limit(5).count(true)
The cursor.explain() method provides information on the query plan. The query plan is the plan the server uses to find the matches for a query. This information may be useful when optimizing a query.
| Parameters: |
|
|---|---|
| Returns: | A document that describes the process used to return the query results. |
Retrieve the query plan by appending explain() to a find() query, as in the following example:
db.products.find().explain()
For details on the output, see Explain Output.
explain runs the actual query to determine the result. Although there are some differences between running the query with explain and running without, generally, the performance will be similar between the two. So, if the query is slow, the explain operation is also slow.
Additionally, the explain operation reevaluates a set of candidate query plans, which may cause the explain operation to perform differently than a normal query. As a result, these operations generally provide an accurate account of how MongoDB would perform the query, but do not reflect the length of these queries.
To determine the performance of a particular index, you can use hint() and in conjunction with explain(), as in the following example:
db.products.find().hint( { type: 1 } ).explain()
When you run explain with hint(), the query optimizer does not reevaluate the query plans.
Note
In some situations, the explain() operation may differ from the actual query plan used by MongoDB in a normal query.
The explain() operation evaluates the set of query plans and reports on the winning plan for the query. In normal operations the query optimizer caches winning query plans and uses them for similar related queries in the future. As a result MongoDB may sometimes select query plans from the cache that are different from the plan displayed using explain.
See also
| Parameters: |
|
|---|
Provides the ability to loop or iterate over the cursor returned by a db.collection.find() query and returns each result on the shell. Specify a JavaScript function as the argument for the cursor.forEach() function. Consider the following example:
db.users.find().forEach( function(u) { print("user: " + u.name); } );
See also
cursor.map() for similar functionality.
| Returns: | Boolean. |
|---|
cursor.hasNext() returns true if the cursor returned by the db.collection.find() query can iterate further to return more documents.
| Arguments: |
|
|---|
Call this method on a query to override MongoDB’s default index selection and query optimization process. The argument is an index specification, like the argument to ensureIndex(). Use db.collection.getIndexes() to return the list of current indexes on a collection.
See also
“$hint“
Use the cursor.limit() method on a cursor to specify the maximum number of documents a the cursor will return. cursor.limit() is analogous to the LIMIT statement in a SQL database.
Note
You must apply cursor.limit() to the cursor before retrieving any documents from the database.
Use cursor.limit() to maximize performance and prevent MongoDB from returning more results than required for processing.
A cursor.limit() value of 0 (e.g. “.limit(0)”) is equivalent to setting no limit.
| Parameters: |
|
|---|
Apply function to each document visited by the cursor, and collect the return values from successive application into an array. Consider the following example:
db.users.find().map( function(u) { return u.name; } );
See also
cursor.forEach() for similar functionality.
The max() method specifies the exclusive upper bound for a specific index in order to constrain the results of find(). max() provides a way to specify an upper bound on compound key indexes.
max() takes the following parameter:
| Parameters: |
|
|---|
See also
Consider the following example of max(), which assumes a collection named products that holds the following documents:
{ "_id" : 6, "item" : "apple", "type" : "cortland", "price" : 1.29 }
{ "_id" : 2, "item" : "apple", "type" : "fuji", "price" : 1.99 }
{ "_id" : 1, "item" : "apple", "type" : "honey crisp", "price" : 1.99 }
{ "_id" : 3, "item" : "apple", "type" : "jonagold", "price" : 1.29 }
{ "_id" : 4, "item" : "apple", "type" : "jonathan", "price" : 1.29 }
{ "_id" : 5, "item" : "apple", "type" : "mcintosh", "price" : 1.29 }
{ "_id" : 7, "item" : "orange", "type" : "cara cara", "price" : 2.99 }
{ "_id" : 10, "item" : "orange", "type" : "navel", "price" : 1.39 }
{ "_id" : 9, "item" : "orange", "type" : "satsuma", "price" : 1.99 }
{ "_id" : 8, "item" : "orange", "type" : "valencia", "price" : 0.99 }
The collection has the following indexes:
{ "_id" : 1 }
{ "item" : 1, "type" : 1 }
{ "item" : 1, "type" : -1 }
{ "price" : 1 }
Using the ordering of { item: 1, type: 1 } index, max() limits the query to the documents that are below the bound of item equal to apple and type equal to jonagold:
db.products.find().max( { item: 'apple', type: 'jonagold' } ).hint( { item: 1, type: 1 } )
The query returns the following documents:
{ "_id" : 6, "item" : "apple", "type" : "cortland", "price" : 1.29 }
{ "_id" : 2, "item" : "apple", "type" : "fuji", "price" : 1.99 }
{ "_id" : 1, "item" : "apple", "type" : "honey crisp", "price" : 1.99 }
If the query did not explicitly specify the index with the hint() method, it is ambiguous as to whether mongod would select the { item: 1, type: 1 } index ordering or the { item: 1, type: -1 } index ordering.
Using the ordering of the index { price: 1 }, max() limits the query to the documents that are below the index key bound of price equal to 1.99 and min() limits the query to the documents that are at or above the index key bound of price equal to 1.39:
db.products.find().min( { price: 1.39 } ).max( { price: 1.99 } ).hint( { price: 1 } )
The query returns the following documents:
{ "_id" : 6, "item" : "apple", "type" : "cortland", "price" : 1.29 }
{ "_id" : 4, "item" : "apple", "type" : "jonathan", "price" : 1.29 }
{ "_id" : 5, "item" : "apple", "type" : "mcintosh", "price" : 1.29 }
{ "_id" : 3, "item" : "apple", "type" : "jonagold", "price" : 1.29 }
{ "_id" : 10, "item" : "orange", "type" : "navel", "price" : 1.39 }
Note
Because max() requires an index on a field, and forces the query to use this index, you may prefer the $lt operator for the query if possible. Consider the following example:
db.products.find( { _id: 7 } ).max( { price: 1.39 } )
The query will use the index on the price field, even if the index on _id may be better.
max() exists primarily to support the mongos (sharding) process.
If you use max() with min() to specify a range, the index bounds specified in min() and max() must both refer to the keys of the same index.
The min() method specifies the inclusive lower bound for a specific index in order to constrain the results of find(). min() provides a way to specify lower bounds on compound key indexes.
min() takes the following parameter:
| Parameters: |
|
|---|
See also
Consider the following example of min(), which assumes a collection named products that holds the following documents:
{ "_id" : 6, "item" : "apple", "type" : "cortland", "price" : 1.29 }
{ "_id" : 2, "item" : "apple", "type" : "fuji", "price" : 1.99 }
{ "_id" : 1, "item" : "apple", "type" : "honey crisp", "price" : 1.99 }
{ "_id" : 3, "item" : "apple", "type" : "jonagold", "price" : 1.29 }
{ "_id" : 4, "item" : "apple", "type" : "jonathan", "price" : 1.29 }
{ "_id" : 5, "item" : "apple", "type" : "mcintosh", "price" : 1.29 }
{ "_id" : 7, "item" : "orange", "type" : "cara cara", "price" : 2.99 }
{ "_id" : 10, "item" : "orange", "type" : "navel", "price" : 1.39 }
{ "_id" : 9, "item" : "orange", "type" : "satsuma", "price" : 1.99 }
{ "_id" : 8, "item" : "orange", "type" : "valencia", "price" : 0.99 }
The collection has the following indexes:
{ "_id" : 1 }
{ "item" : 1, "type" : 1 }
{ "item" : 1, "type" : -1 }
{ "price" : 1 }
Using the ordering of { item: 1, type: 1 } index, min() limits the query to the documents that are at or above the index key bound of item equal to apple and type equal to jonagold, as in the following:
db.products.find().min( { item: 'apple', type: 'jonagold' } ).hint( { item: 1, type: 1 } )
The query returns the following documents:
{ "_id" : 3, "item" : "apple", "type" : "jonagold", "price" : 1.29 }
{ "_id" : 4, "item" : "apple", "type" : "jonathan", "price" : 1.29 }
{ "_id" : 5, "item" : "apple", "type" : "mcintosh", "price" : 1.29 }
{ "_id" : 7, "item" : "orange", "type" : "cara cara", "price" : 2.99 }
{ "_id" : 10, "item" : "orange", "type" : "navel", "price" : 1.39 }
{ "_id" : 9, "item" : "orange", "type" : "satsuma", "price" : 1.99 }
{ "_id" : 8, "item" : "orange", "type" : "valencia", "price" : 0.99 }
If the query did not explicitly specify the index with the hint() method, it is ambiguous as to whether mongod would select the { item: 1, type: 1 } index ordering or the { item: 1, type: -1 } index ordering.
Using the ordering of the index { price: 1 }, min() limits the query to the documents that are at or above the index key bound of price equal to 1.39 and max() limits the query to the documents that are below the index key bound of price equal to 1.99:
db.products.find().min( { price: 1.39 } ).max( { price: 1.99 } ).hint( { price: 1 } )
The query returns the following documents:
{ "_id" : 6, "item" : "apple", "type" : "cortland", "price" : 1.29 }
{ "_id" : 4, "item" : "apple", "type" : "jonathan", "price" : 1.29 }
{ "_id" : 5, "item" : "apple", "type" : "mcintosh", "price" : 1.29 }
{ "_id" : 3, "item" : "apple", "type" : "jonagold", "price" : 1.29 }
{ "_id" : 10, "item" : "orange", "type" : "navel", "price" : 1.39 }
Note
Because min() requires an index on a field, and forces the query to use this index, you may prefer the $gte operator for the query if possible. Consider the following example:
db.products.find( { _id: 7 } ).min( { price: 1.39 } )
The query will use the index on the price field, even if the index on _id may be better.
min() exists primarily to support the mongos (sharding) process.
If you use min() with max() to specify a range, the index bounds specified in min() and max() must both refer to the keys of the same index.
| Returns: | The next document in the cursor returned by the db.collection.find() method. See cursor.hasNext() related functionality. |
|---|
| Parameters: |
|
|---|
Append the readPref() to a cursor to control how the client will route the query will route to members of the replica set.
The mode string should be one of:
The tagSet parameter, if given, should consist of an array of tag set objects for filtering secondary read operations. For example, a secondary member tagged { dc: 'ny', rack: 2, size: 'large' } will match the tag set { dc: 'ny', rack: 2 }. Clients match tag sets first in the order they appear in the read preference specification. You may specify an empty tag set {} as the last element to default to any available secondary. See the tag sets documentation for more information.
Note
You must apply cursor.readPref() to the cursor before retrieving any documents from the database.
| Returns: | A modified cursor object that contains documents with appended information that describes the on-disk location of the document. |
|---|
See also
$showDiskLoc for related functionality.
| Returns: | A count of the number of documents that match the db.collection.find() query after applying any cursor.skip() and cursor.limit() methods. |
|---|
Call the cursor.skip() method on a cursor to control where MongoDB begins returning results. This approach may be useful in implementing “paged” results.
Note
You must apply cursor.skip() to the cursor before retrieving any documents from the database.
Consider the following JavaScript function as an example of the sort function:
function printStudents(pageNumber, nPerPage) {
print("Page: " + pageNumber);
db.students.find().skip((pageNumber-1)*nPerPage).limit(nPerPage).forEach( function(student) { print(student.name + "<p>"); } );
}
The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result. As offset (e.g. pageNumber above) increases, cursor.skip() will become slower and more CPU intensive. With larger collections, cursor.skip() may become IO bound.
Consider using range-based pagination for these kinds of tasks. That is, query for a range of objects, using logic within the application to determine the pagination rather than the database itself. This approach features better index utilization, if you do not need to easily jump to a specific page.
Append the cursor.snapshot() method to a cursor to toggle the “snapshot” mode. This ensures that the query will not return a document multiple times, even if intervening write operations result in a move of the document due to the growth in document size.
Warning
The snapshot() does not guarantee isolation from insertion or deletions.
The cursor.snapshot() traverses the index on the _id field. As such, snapshot() cannot be used with sort() or hint().
Queries with results of less than 1 megabyte are effectively implicitly snapshotted.
| Parameters: |
|
|---|
Append the sort() method to a cursor to control the order that the query returns matching documents. For each field in the sort document, if the field’s corresponding value is positive, then sort() returns query results in ascending order for that attribute: if the field’s corresponding value is negative, then sort() returns query results in descending order.
Note
You must apply cursor.limit() to the cursor before retrieving any documents from the database.
Consider the following example:
db.collection.find().sort( { age: -1 } );
Here, the query returns all documents in collection sorted by the age field in descending order. Specify a value of negative one (e.g. -1), as above, to sort in descending order or a positive value (e.g. 1) to sort in ascending order.
Unless you have a index for the specified key pattern, use cursor.sort() in conjunction with cursor.limit() to avoid requiring MongoDB to perform a large, in-memory sort. cursor.limit() increases the speed and reduces the amount of memory required to return this query by way of an optimized algorithm.
Warning
The sort function requires that the entire sort be able to complete within 32 megabytes. When the sort option consumes more than 32 megabytes, MongoDB will return an error. Use cursor.limit(), or create an index on the field that you’re sorting to avoid this error.
The $natural parameter returns items according to their order on disk. Consider the following query:
db.collection.find().sort( { $natural: -1 } )
This will return documents in the reverse of the order on disk. Typically, the order of documents on disks reflects insertion order, except when documents move internal because of document growth due to update operations.
| Parameters: |
|
|---|
Use this function to create new database users, by specifying a username and password as arguments to the command. If you want to restrict the user to have only read-only privileges, supply a true third argument; however, this defaults to false.
| Parameters: |
|
|---|
Allows a user to authenticate to the database from within the shell. Alternatively use mongo --username and --password to specify authentication credentials.
| Parameters: |
|
|---|
Use this function to copy a database from a remote to the current database. The command assumes that the remote database has the same name as the current database. For example, to clone a database named importdb on a host named hostname, do
use importdb
db.cloneDatabase("hostname");
New databases are implicitly created, so the current host does not need to have a database named importdb for this command to succeed.
This function provides a wrapper around the MongoDB database command “clone.” The copydb database command provides related functionality.
New in version 2.1.0.
Always call the db.collection.aggregate() method on a collection object.
| Arguments: |
|
|---|
Consider the following example from the aggregation documentation.
db.article.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : { tags : 1 },
authors : { $addToSet : "$author" }
} }
);
See also
“aggregate,” “Aggregation Framework,” and “Aggregation Framework Reference.”
The db.collection.count() method is a shell wrapper that returns the count of documents that would match a find() query; i.e., db.collection.count() method is equivalent to:
db.collection.find(<query>).count();
This operation does not actually perform the find(); instead, the operation counts the results that would be returned by the find().
The db.collection.count() method can accept the following argument:
| Parameters: |
|
|---|
Consider the following examples of the db.collection.count() method
Count the number of all documents in the orders collection:
db.orders.count()
The query is equivalent to the following:
db.orders.find().count()
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012'):
db.orders.count( { ord_dt: { $gt: new Date('01/01/2012') } } )
The query is equivalent to the following:
db.orders.find( { ord_dt: { $gt: new Date('01/01/2012') } } ).count()
See also
Deprecated since version 1.8.
| Parameters: |
|---|
The ensureIndex() method is the preferred way to create indexes on collections.
| Returns: | The size of the collection. This method provides a wrapper around the size output of the collStats (i.e. db.collection.stats()) command. |
|---|
The db.collection.distinct() method finds the distinct values for a specified field across a single collection and returns the results in an array. The method accepts the following argument:
| Parameters: |
|
|---|
Note
Consider the following examples of the db.collection.distinct() method:
Return an array of the distinct values of the field ord_dt from all documents in the orders collection:
db.orders.distinct( 'ord_dt' )
Return an array of the distinct values of the field sku in the subdocument item from all documents in the orders collection:
db.orders.distinct( 'item.sku' )
Return an array of the distinct values of the field ord_dt from the documents in the orders collection where the price is greater than 10:
db.orders.distinct( 'ord_dt',
{ price: { $gt: 10 } }
)
Call the db.collection.drop() method on a collection to drop it from the database.
db.collection.drop() takes no arguments and will produce an error if called with any arguments.
Drops or removes the specified index from a collection. The db.collection.dropIndex() method provides a wrapper around the dropIndexes command.
The db.collection.dropIndex() method takes the following parameter:
| Parameters: |
|
|---|
The db.collection.dropIndex() method cannot drop the _id index. Use the db.collection.getIndexes() method to view all indexes on a collection.
Consider the following examples of the db.collection.dropIndex() method that assumes the following indexes on the collection pets:
> db.pets.getIndexes()
[
{ "v" : 1,
"key" : { "_id" : 1 },
"ns" : "test.pets",
"name" : "_id_"
},
{
"v" : 1,
"key" : { "cat" : -1 },
"ns" : "test.pets",
"name" : "catIdx"
},
{
"v" : 1,
"key" : { "cat" : 1, "dog" : -1 },
"ns" : "test.pets",
"name" : "cat_1_dog_-1"
}
]
To drop the index on the field cat, you must use the index name catIdx:
db.pets.dropIndex( 'catIdx' )
To drop the index on the fields cat and dog, you use either the index name cat_1_dog_-1 or the key { "cat" : 1, "dog" : -1 }:
db.pets.dropIndex( 'cat_1_dog_-1' )
db.pets.dropIndex( { cat : 1, dog : -1 } )
Drops all indexes other than the required index on the _id field. Only call dropIndexes() as a method on a collection object.
| Parameters: |
|---|
Warning
Index names, including their full namespace (i.e. database.collection) can be no longer than 128 characters. See the db.collection.getIndexes() field “name” for the names of existing indexes.
See
The Indexes section of this manual for full documentation of indexes and indexing in MongoDB.
Creates an index on the field specified, if that index does not already exist. If the keys document specifies more than one field, than db.collection.ensureIndex() creates a compound index. For example:
db.collection.ensureIndex({ [key]: 1})
This command creates an index, in ascending order, on the field [key]. To specify a compound index use the following form:
db.collection.ensureIndex({ [key]: 1, [key1]: -1 })
This command creates a compound index on the key field (in ascending order) and key1 field (in descending order.)
Note
Typically the order of an index is only important when doing cursor.sort() operations on the indexed fields.
The available options, possible values, and the default settings are as follows:
| Option | Plugin | Default |
|---|---|---|
| background | true or false | false |
| unique | true or false | false |
| name | string | none |
| cache | true or false | true |
| dropDups | true or false | false |
| sparse | true or false | false |
| expireAfterSeconds | integer | none |
| v | index version | 1 |
| Options: |
|
|---|
Please be aware of the following behaviors of ensureIndex():
To add or change index options you must drop the index using the db.collection.dropIndex() and issue another ensureIndex() operation with the new options.
If you create an index with one set of options, and then issue ensureIndex() method command with the same index fields and different options without first dropping the index, ensureIndex() will not rebuild the existing index with the new options.
If you call multiple ensureIndex() methods with the same index specification at the same time, only the first operation will succeed, all other operations will have no effect.
Non-background indexing operations will block all other operations on a database.
You cannot stop a foreground index build once it’s begun. See the Monitor and Control Index Building for more information.
| [1] | The default index version depends on the version of mongod running when creating the index. Before version 2.0, the this value was 0; versions 2.0 and later use version 1. |
The find() method selects documents in a collection and returns a cursor to the selected documents.
The find() method takes the following parameters.
| Parameters: |
|
|---|---|
| Returns: | A cursor to the documents that match the query criteria and contain the projection fields. |
Note
In the mongo shell, you can access the returned documents directly without explicitly using the JavaScript cursor handling method. Executing the query directly on the mongo shell prompt automatically iterates the cursor to display up to the first 20 documents. Type it to continue iteration.
Consider the following examples of the find() method:
To select all documents in a collection, call the find() method with no parameters:
db.products.find()
This operation returns all the documents with all the fields from the collection products. By default, in the mongo shell, the cursor returns the first batch of 20 matching documents. In the mongo shell, iterate through the next batch by typing it. Use the appropriate cursor handling mechanism for your specific language driver.
To select the documents that match a selection criteria, call the find() method with the query criteria:
db.products.find( { qty: { $gt: 25 } } )
This operation returns all the documents from the collection products where qty is greater than 25, including all fields.
To select the documents that match a selection criteria and return, or project only certain fields into the result set, call the find() method with the query criteria and the projection parameter, as in the following example:
db.products.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 } )
This operation returns all the documents from the collection products where qty is greater than 25. The documents in the result set only include the _id, item, and qty fields using “inclusion” projection. find() always returns the _id field, even when not explicitly included:
{ "_id" : 11, "item" : "pencil", "qty" : 50 }
{ "_id" : ObjectId("50634d86be4617f17bb159cd"), "item" : "bottle", "qty" : 30 }
{ "_id" : ObjectId("50634dbcbe4617f17bb159d0"), "item" : "paper", "qty" : 100 }
To select the documents that match a query criteria and exclude a set of fields from the resulting documents, call the find() method with the query criteria and the projection parameter using the exclude syntax:
db.products.find( { qty: { $gt: 25 } }, { _id: 0, qty: 0 } )
The query will return all the documents from the collection products where qty is greater than 25. The documents in the result set will contain all fields except the _id and qty fields, as in the following:
{ "item" : "pencil", "type" : "no.2" }
{ "item" : "bottle", "type" : "blue" }
{ "item" : "paper" }
The db.collection.findAndModify() method atomically modifies and returns a single document. By default, the returned document does not include the modifications made on the update. To return the document with the modifications made on the update, use the new option.
The db.collection.findAndModify() method takes a document parameter with the following subdocument fields:
| Fields: |
|
|---|
Consider the following example:
db.people.findAndModify( {
query: { name: "Tom", state: "active", rating: { $gt: 10 } },
sort: { rating: 1 },
update: { $inc: { score: 1 } }
} );
This command performs the following actions:
Warning
When using findAndModify in a sharded environment, the query must contain the shard key for all operations against the shard cluster. findAndModify operations issued against mongos instances for non-sharded collections function normally.
| Parameters: |
|
|---|---|
| Returns: | One document that satisfies the query specified as the argument to this method. |
Returns only one document that satisfies the specified query. If multiple documents satisfy the query, this method returns the first document according to the natural order which reflects the order of documents on the disc. In capped collections, natural order is the same as insertion order.
Returns an array that holds a list of documents that identify and describe the existing indexes on the collection. You must call the db.collection.getIndexes() on a collection. For example:
db.collection.getIndexes()
Change collection to the name of the collection whose indexes you want to learn.
The db.collection.getIndexes() items consist of the following fields:
Holds the version of the index.
The index version depends on the version of mongod that created the index. Before version 2.0 of MongoDB, the this value was 0; versions 2.0 and later use version 1.
Contains a document holding the keys held in the index, and the order of the index. Indexes may be either descending or ascending order. A value of negative one (e.g. -1) indicates an index sorted in descending order while a positive value (e.g. 1) indicates an index sorted in an ascending order.
The namespace context for the index.
A unique name for the index comprised of the field names and orders of all keys.
The db.collection.group() method groups documents in a collection by the specified keys and performs simple aggregation functions such as computing counts and sums. The method is analogous to a SELECT .. GROUP BY statement in SQL. The group() method returns an array.
The db.collection.group() accepts a single document that contains the following:
| Fields: |
|
|---|
The db.collection.group() method is a shell wrapper for the group command; however, the db.collection.group() method takes the keyf field and the reduce field whereas the group command takes the $keyf field and the $reduce field.
Warning
Note
Consider the following examples of the db.collection.group() method:
The examples assume an orders collection with documents of the following prototype:
{
_id: ObjectId("5085a95c8fada716c89d0021"),
ord_dt: ISODate("2012-07-01T04:00:00Z"),
ship_dt: ISODate("2012-07-02T04:00:00Z"),
item: { sku: "abc123",
price: 1.99,
uom: "pcs",
qty: 25 }
}
The following example groups by the ord_dt and item.sku fields those documents that have ord_dt greater than 01/01/2011:
db.orders.group( {
key: { ord_dt: 1, 'item.sku': 1 },
cond: { ord_dt: { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
The result is an array of documents that contain the group by fields:
[ { "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc456"},
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "bcd123"},
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "efg456"},
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "efg456"},
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "ijk123"},
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc456"},
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc123"},
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc456"} ]
The method call is analogous to the SQL statement:
SELECT ord_dt, item_sku
FROM orders
WHERE ord_dt > '01/01/2012'
GROUP BY ord_dt, item_sku
The following example groups by the ord_dt and item.sku fields, those documents that have ord_dt greater than 01/01/2011 and calculates the sum of the qty field for each grouping:
db.orders.group( {
key: { ord_dt: 1, 'item.sku': 1 },
cond: { ord_dt: { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) {
result.total += curr.item.qty;
},
initial: { total : 0 }
} )
The result is an array of documents that contain the group by fields and the calculated aggregation field:
[ { "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc123", "total" : 25 },
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "abc456", "total" : 25 },
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "bcd123", "total" : 10 },
{ "ord_dt" : ISODate("2012-07-01T04:00:00Z"), "item.sku" : "efg456", "total" : 10 },
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "abc123", "total" : 25 },
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "efg456", "total" : 15 },
{ "ord_dt" : ISODate("2012-06-01T04:00:00Z"), "item.sku" : "ijk123", "total" : 20 },
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc123", "total" : 45 },
{ "ord_dt" : ISODate("2012-05-01T04:00:00Z"), "item.sku" : "abc456", "total" : 25 },
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc123", "total" : 25 },
{ "ord_dt" : ISODate("2012-06-08T04:00:00Z"), "item.sku" : "abc456", "total" : 25 } ]
The method call is analogous to the SQL statement:
SELECT ord_dt, item_sku, SUM(item_qty) as total
FROM orders
WHERE ord_dt > '01/01/2012'
GROUP BY ord_dt, item_sku
The following example groups by the calculated day_of_week field, those documents that have ord_dt greater than 01/01/2011 and calculates the sum, count, and average of the qty field for each grouping:
db.orders.group( {
keyf: function(doc) {
return { day_of_week: doc.ord_dt.getDay() } ; },
cond: { ord_dt: { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) {
result.total += curr.item.qty;
result.count++;
},
initial: { total : 0, count: 0 },
finalize: function(result) {
var weekdays = [ "Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday",
"Friday", "Saturday" ];
result.day_of_week = weekdays[result.day_of_week];
result.avg = Math.round(result.total / result.count);
}
} )
The result is an array of documents that contain the group by fields and the calculated aggregation field:
[ { "day_of_week" : "Sunday", "total" : 70, "count" : 4, "avg" : 18 },
{ "day_of_week" : "Friday", "total" : 110, "count" : 6, "avg" : 18 },
{ "day_of_week" : "Tuesday", "total" : 70, "count" : 3, "avg" : 23 } ]
See also
The insert() method inserts a document or documents into a collection.
Changed in version 2.2: The insert() method can accept an array of documents to perform a bulk insert of the documents into the collection.
Consider the following behaviors of the insert() method:
The insert() method takes one of the following parameters:
| Parameters: |
|
|---|
Consider the following examples of the insert() method:
To insert a single document and have MongoDB generate the unique _id, omit the _id field in the document and pass the document to the insert() method as in the following:
db.products.insert( { item: "card", qty: 15 } )
This operation inserts a new document into the products collection with the item field set to card, the qty field set to 15, and the _id field set to a unique ObjectId:
{ "_id" : ObjectId("5063114bd386d8fadbd6b004"), "item" : "card", "qty" : 15 }
Note
Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.
To insert a single document, with a custom _id field, include the _id field set to a unique identifier and pass the document to the insert() method as follows:
db.products.insert( { _id: 10, item: "box", qty: 20 } )
This operation inserts a new document in the products collection with the _id field set to 10, the item field set to box, the qty field set to 20:
{ "_id" : 10, "item" : "box", "qty" : 20 }
Note
Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.
To insert multiple documents, pass an array of documents to the insert() method as in the following:
db.products.insert( [ { _id: 11, item: "pencil", qty: 50, type: "no.2" },
{ item: "pen", qty: 20 },
{ item: "eraser", qty: 25 } ] )
The operation will insert three documents into the products collection:
{ "_id" : 11, "item" : "pencil", "qty" : 50, "type" : "no.2" }
{ "_id" : ObjectId("50631bc0be4617f17bb159ca"), "item" : "pen", "qty" : 20 }
{ "_id" : ObjectId("50631bc0be4617f17bb159cb"), "item" : "eraser", "qty" : 25 }
| Returns: | Returns true if the collection is a capped collection, otherwise returns false. |
|---|
See also
The db.collection.mapReduce() method provides a wrapper around the mapReduce command.
db.collection.mapReduce(
mapfunction,
reducefunction,
{
out: <collection>,
query: <document>,
sort: <document>,
limit: <number>,
finalize: <function>,
scope: <document>,
jsMode: <boolean>,
verbose: <boolean>
}
)
db.collection.mapReduce() takes the following parameters:
| Parameters: |
|
|---|
Consider the following map-reduce operations on a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 250,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Perform map-reduce operation on the orders collection to group by the cust_id, and for each cust_id, calculate the sum of the price for each cust_id:
Define the map function to process each input document:
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
Define the corresponding reduce function with two arguments keyCustId and valuesPrices:
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
This operation outputs the results to a collection named map_reduce_example. If the map_reduce_example collection already exists, the operation will replace the contents with the results of this map-reduce operation:
In this example you will perform a map-reduce operation on the orders collection, for all documents that have an ord_date value greater than 01/01/2012. The operation groups by the item.sku field, and for each sku calculates the number of orders and the total quantity ordered. The operation concludes by calculating the average quantity per order for each sku value:
Define the map function to process each input document:
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = {
count: 1,
qty: this.items[idx].qty
};
emit(key, value);
}
};
Define the corresponding reduce function with two arguments keySKU and valuesCountObjects:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
Define a finalize function with two arguments key and reducedValue. The function modifies the reducedValue object to add a computed field named average and returns the modified object:
var finalizeFunction2 = function (key, reducedValue) {
reducedValue.average = reducedValue.qty/reducedValue.count;
return reducedValue;
};
Perform the map-reduce operation on the orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date: { $gt: new Date('01/01/2012') } },
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents with ord_date greater than new Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the map_reduce_example collection already exists, the operation will merge the existing contents with the results of this map-reduce operation:
For more information and examples, see the Map-Reduce page.
See also
This method drops all indexes and recreates them. This operation may be expensive for collections that have a large amount of data and/or a large number of indexes.
Call this method, which takes no arguments, on a collection object. For example:
db.collection.reIndex()
Change collection to the name of the collection that you want to rebuild the index.
The remove method removes documents from a collection.
The remove() method can take the following parameters:
| Parameters: |
|
|---|
Note
You cannot apply the remove() method to a capped collection.
Consider the following examples of the remove method.
To remove all documents in a collection, call the remove method with no parameters:
db.products.remove()
This operation will remove all the documents from the collection products.
To remove the documents that match a deletion criteria, call the remove method with the query criteria:
db.products.remove( { qty: { $gt: 20 } } )
This operation removes all the documents from the collection products where qty is greater than 20.
To remove the first document that match a deletion criteria, call the remove method with the query criteria and the justOne parameter set to true or 1:
db.products.remove( { qty: { $gt: 20 } }, true )
This operation removes all the documents from the collection products where qty is greater than 20.
Note
If the query argument to the remove() method matches multiple documents in the collection, the delete operation may interleave with other write operations to that collection. For an unsharded collection, you have the option to override this behavior with the $atomic isolation operator, effectively isolating the delete operation and blocking other write operations during the delete. To isolate the query, include $atomic: 1 in the query parameter as in the following example:
db.products.remove( { qty: { $gt: 20 }, $atomic: 1 } )
db.collection.renameCollection() provides a helper for the renameCollection database command in the mongo shell to rename existing collections.
| Parameters: |
|
|---|
Call the db.collection.renameCollection() method on a collection object, to rename a collection. Specify the new name of the collection as an argument. For example:
db.rrecord.renameCollection("record")
This operation will rename the rrecord collection to record. If the target name (i.e. record) is the name of an existing collection, then the operation will fail.
Consider the following limitations:
The db.collection.renameCollection() method operates within a collection by changing the metadata associated with a given collection.
Refer to the documentation renameCollection for additional warnings and messages.
Warning
The db.collection.renameCollection() method and renameCollection command will invalidate open cursors which interrupts queries that are currently returning data.
The save() method updates an existing document or inserts a document depending on the parameter.
The save() method takes the following parameter:
| Parameters: |
|
|---|
Consider the following examples of the save() method:
Pass to the save() method a document without an _id field, so that to insert the document into the collection and have MongoDB generate the unique _id as in the following:
db.products.save( { item: "book", qty: 40 } )
This operation inserts a new document into the products collection with the item field set to book, the qty field set to 40, and the _id field set to a unique ObjectId:
{ "_id" : ObjectId("50691737d386d8fadbd6b01d"), "item" : "book", "qty" : 40 }
Note
Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.
Pass to the save() method a document with an _id field that holds a value that does not exist in the collection to insert the document with that value in the _id value into the collection, as in the following:
db.products.save( { _id: 100, item: "water", qty: 30 } )
This operation inserts a new document into the products collection with the _id field set to 100, the item field set to water, and the field qty set to 30:
{ "_id" : 100, "item" : "water", "qty" : 30 }
Note
Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.
Pass to the save() method a document with the _id field set to a value in the collection to replace all fields and values of the matching document with the new fields and values, as in the following:
db.products.save( { _id:100, item:"juice" } )
This operation replaces the existing document with a value of 100 in the _id field. The updated document will resemble the following:
{ "_id" : 100, "item" : "juice" }
| Parameters: |
|
|---|---|
| Returns: | A document containing statistics that reflecting the state of the specified collection. |
This function provides a wrapper around the database command collStats. The scale option allows you to configure how the mongo shell scales the sizes of things in the output. For example, specify a scale value of 1024 to display kilobytes rather than bytes.
Call the db.collection.stats() method on a collection object, to return statistics regarding that collection. For example, the following operation returns stats on the people collection:
db.people.stats()
See also
“Collection Statistics Reference” for an overview of the output of this command.
| Returns: | The amount of storage space, calculated using the number of extents, used by the collection. This method provides a wrapper around the storageSize output of the collStats (i.e. db.collection.stats()) command. |
|---|
| Returns: | The total size of all indexes for the collection. This method provides a wrapper around the db.collection.totalIndexSize() output of the collStats (i.e. db.collection.stats()) command. |
|---|
The update() method modifies an existing document or documents in a collection. By default the update() method updates a single document. To update all documents in the collection that match the update query criteria, specify the multi option. To insert a document if no document matches the update query criteria, specify the upsert option.
Changed in version 2.2: The mongo shell provides an updated interface that accepts the options parameter in a document format to specify multi and upsert options.
Prior to version 2.2, in the mongo shell, upsert and multi were positional boolean options:
db.collection.update(query, update, <upsert,> <multi>)
The update() method takes the following parameters:
| Parameters: |
|
|---|
Although the update operation may apply mostly to updating the values of the fields, the update() method can also modify the name of the field in a document using the $rename operator.
Consider the following examples of the update() method. These examples all use the 2.2 interface to specify options in the document form.
To update specific fields in a document, call the update() method with an update parameter using field: value pairs and expressions using update operators as in the following:
db.products.update( { item: "book", qty: { $gt: 5 } }, { $set: { x: 6 }, $inc: { y: 5} } )
This operation updates a document in the products collection that matches the query criteria and sets the value of the field x to 6, and increment the value of the field y by 5. All other fields of the document remain the same.
To replace all the fields in a document with the document as specified in the update parameter, call the update() method with an update parameter that consists of only key: value expressions, as in the following:
db.products.update( { item: "book", qty: { $gt: 5 } }, { x: 6, y: 15 } )
This operation selects a document from the products collection that matches the query criteria sets the value of the field x to 6 and the value of the field y to 15. All other fields of the matched document are removed, except the _id field.
To update multiple documents, call the update() method and specify the multi option in the options argument, as in the following:
db.products.update( { item: "book", qty: { $gt: 5 } }, { $set: { x: 6, y: 15 } }, { multi: true } )
This operation updates all documents in the products collection that match the query criteria by setting the value of the field x to 6 and the value of the field y to 15. This operation does not affect any other fields in documents in the products collection.
You can perform the same operation by calling the update() method with the multi parameter:
db.products.update( { item: "book", qty: { $gt: 5 } }, { $set: { x: 6, y: 15 } }, false, true )
To update a document or to insert a new document if no document matches the query criteria, call the update() and specify the upsert option in the options argument, as in the following:
db.products.update( { item: "magazine", qty: { $gt: 5 } }, { $set: { x: 25, y: 50 } }, { upsert: true } )
This operation, will:
| Parameters: |
|
|---|
Provides a wrapper around the validate database command. Call the db.collection.validate() method on a collection object, to validate the collection itself. Specify the full option to return full statistics.
The validation operation scans all of the data structures for correctness and returns a single document that describes the relationship between the logical collection and the physical representation of that data.
The output can provide a more in depth view of how the collection uses storage. Be aware that this command is potentially resource intensive, and may impact the performance of your MongoDB instance.
See also
| Parameters: |
|
|---|---|
| Returns: | Help text for the specified database command. See the database command reference for full documentation of these commands. |
| Parameters: |
|
|---|
Use this function to copy a specific database, named origin running on the system accessible via hostname into the local database named destination. The command creates destination databases implicitly when they do not exist. If you omit the hostname, MongoDB will copy data from one database into another on the same instance.
This function provides a wrapper around the MongoDB database command “copydb.” The clone database command provides related functionality.
| Parameters: |
|
|---|---|
| Options: |
|
Explicitly creates a new collation. Because MongoDB creates collections implicitly when referenced, this command is primarily used for creating new capped collections. In some circumstances, you may use this command to pre-allocate space for an ordinary collection.
Capped collections have maximum size or document counts that prevent them from growing beyond maximum thresholds. All capped collections must specify a maximum size, but may also specify a maximum document count. The collection will remove older documents if a collection reaches the maximum size limit before it reaches the maximum document count. Consider the following example:
db.createCollection("log", { capped : true, size : 5242880, max : 5000 } )
This command creates a collection named log with a maximum size of 5 megabytes or a maximum of 5000 documents.
The following command simply pre-allocates a 2 gigabyte, uncapped, collection named people:
db.createCollection("people", { size: 2147483648 })
This command provides a wrapper around the database command create. See the “Capped Collections” wiki page for more information about capped collections.
| Returns: | A document that contains an array named inprog. |
|---|
The inprog array reports the current operation in progress for the database instance. See Current Operation Reporting for full documentation of the output of db.currentOp().
db.currentOp() is only available for users with administrative privileges.
Consider the following JavaScript operations for the mongo shell that you can use to filter the output of identify specific types of operations:
Return all pending write operations:
db.currentOp().inprog.forEach(
function(d){
if(d.waitingForLock && d.lockType != "read")
printjson(d)
})
Return the active write operation:
db.currentOp().inprog.forEach(
function(d){
if(d.active && d.lockType == "write")
printjson(d)
})
Return all active read operations:
db.currentOp().inprog.forEach(
function(d){
if(d.active && d.lockType == "read")
printjson(d)
})
Warning
Terminate running operations with extreme caution. Only use db.killOp() to terminate operations initiated by clients and do not terminate internal database operations.
Removes the current database. Does not change the current database, so the insertion of any documents in this database will allocate a fresh set of data files.
The db.eval() provides the ability to run JavaScript code on the MongoDB server. It is a mongo shell wrapper around the eval command.
The method accepts the following parameters:
| Parameters: |
|
|---|
Consider the following example of the db.eval() method:
db.eval( function(name, incAmount) {
var doc = db.myCollection.findOne( { name : name } );
doc = doc || { name : name , num : 0 , total : 0 , avg : 0 };
doc.num++;
doc.total += incAmount;
doc.avg = doc.total / doc.num;
db.myCollection.save( doc );
return doc;
},
"<name>", 5 );
If you want to use the server’s interpreter, you must run db.eval(). Otherwise, the mongo shell’s JavaScript interpreter evaluates functions entered directly into the shell.
If an error occurs, db.eval() throws an exception. Consider the following invalid function that uses the variable x without declaring it as an argument:
db.eval( function() { return x + x; }, 3 );
The statement will result in the following exception:
{
"errno" : -3,
"errmsg" : "invoke failed: JS Error: ReferenceError: x is not defined nofile_b:1",
"ok" : 0
}
Warning
See also
Forces the mongod to flush pending all write operations to the disk and locks the entire mongod instance to prevent additional writes until the user releases the lock with the db.fsyncUnlock() command. db.fsyncLock() is an administrative command.
This command provides a simple wrapper around a fsync database command with the following syntax:
{ fsync: 1, lock: true }
This function locks the database and create a window for backup operations.
Note
The database cannot be locked with db.fsyncLock() while profiling is enabled. You must disable profiling before locking the database with db.fsyncLock(). Disable profiling using db.setProfilingLevel() as follows in the mongo shell:
db.setProfilingLevel(0)
Unlocks a mongod instance to allow writes and reverses the operation of a db.fsyncLock() operation. Typically you will use db.fsyncUnlock() following a database backup operation.
db.fsyncUnlock() is an administrative command.
| Parameters: |
|
|---|---|
| Returns: | A collection. |
Use this command to obtain a handle on a collection whose name might interact with the shell itself, including collections with names that begin with _ or mirror the database commands.
| Returns: | An array containing all collections in the existing database. |
|---|
| Returns: | The last error message string. |
|---|
Sets the level of write concern for confirming the success of write operations.
See
getLastError for all options, Write Concern for a conceptual overview, Write Operations for information about all write operations in MongoDB, and Replica Set Write Concern for special considerations related to write concern for replica sets.
| Returns: | The current database connection. |
|---|
db.getMongo() runs when the shell initiates. Use this command to test that the mongo shell has a connection to the proper database instance.
| Returns: | A status document, containing the errors. |
|---|
Deprecated since version 1.6.
This output reports all errors since the last time the database received a resetError (also db.resetError()) command.
This method provides a wrapper around the getPrevError command.
This method provides a wrapper around the database command “profile” and returns the current profiling level.
Deprecated since version 1.8.4: Use db.getProfilingStatus() for related functionality.
| Returns: | A status document. |
|---|
The output reports statistics related to replication.
See also
“Replication Info Reference” for full documentation of this output.
Used to return another database without modifying the db variable in the shell environment.
Returns a status document with fields that includes the ismaster field that reports if the current node is the primary node, as well as a report of a subset of current replica set configuration.
This function provides a wrapper around the database command isMaster
| Parameters: |
|
|---|
Terminates the specified operation. Use db.currentOp() to find operations and their corresponding ids. See Current Operation Reporting for full documentation of the output of db.currentOp().
Warning
Terminate running operations with extreme caution. Only use db.killOp() to terminate operations initiated by clients and do not terminate internal database operations.
Provides a list of all database commands. See the “Command Reference” document for a more extensive index of these options.
db.loadServerScripts() loads all scripts in the system.js collection for the current database into the mongo shell session.
Documents in the system.js collection have the following prototype form:
{ _id : "<name>" , value : <function> } }
The documents in the system.js collection provide functions that your applications can use in any JavaScript context with MongoDB in this database. These contexts include $where clauses and mapReduce operations.
Provides a wrapper around the db.collection.stats() method. Returns statistics from every collection separated by three hyphen characters.
See also
Provides a formatted report of the status of a replica set from the perspective of the primary set member. See the “Replica Set Status Reference” for more information regarding the contents of this output.
This function will return db.printSlaveReplicationInfo() if issued against a secondary set member.
Provides a formatted report of the sharding configuration and the information regarding existing chunks in a sharded cluster.
Only use db.printShardingStatus() when connected to a mongos instance.
This method is a wrapper around the printShardingStatus command.
See also
Provides a formatted report of the status of a replica set from the perspective of the secondary set member. See the “Replica Set Status Reference” for more information regarding the contents of this output.
| Parameters: |
|
|---|
Removes the specified username from the database.
Warning
In general, if you have an intact copy of your data, such as would exist on a very recent backup or an intact member of a replica set, do not use repairDatabase or related options like db.repairDatabase() in the mongo shell or mongod --repair. Restore from an intact copy of your data.
Note
When using journaling, there is almost never any need to run repairDatabase. In the event of an unclean shutdown, the server will be able restore the data files to a pristine state automatically.
db.repairDatabase() provides a wrapper around the database command repairDatabase, and has the same effect as the run-time option mongod --repair option, limited to only the current database. See repairDatabase for full documentation.
Deprecated since version 1.6.
Resets the error message returned by db.getPrevError or getPrevError. Provides a wrapper around the resetError command.
| Parameters: |
|
|---|
Provides a helper to run specified database commands. This is the preferred method to issue database commands, as it provides a consistent interface between the shell and drivers.
Returns a document that provides an overview of the database process’s state.
This command provides a wrapper around the database command serverStatus.
See also
“Server Status Reference” for complete documentation of the output of this function.
| Parameters: |
|
|---|
Modifies the current database profiler level. This allows administrators to capture data regarding performance. The database profiling system can impact performance and can allow the server to write the contents of queries to the log, which might information security implications for your deployment.
The following profiling levels are available:
| Level | Setting |
| 0 | Off. No profiling. |
| 1 | On. Only includes slow operations. |
| 2 | On. Includes all operations. |
Also configure the slowms option to set the threshold for the profiler to consider a query “slow.” Specify this value in milliseconds to override the default.
This command provides a wrapper around the database command profile.
mongod writes the output of the database profiler to the system.profile collection.
mongod prints information about queries that take longer than the slowms to the log even when the database profiler is not active.
Note
The database cannot be locked with db.fsyncLock() while profiling is enabled. You must disable profiling before locking the database with db.fsyncLock(). Disable profiling using db.setProfilingLevel() as follows in the mongo shell:
db.setProfilingLevel(0)
Shuts down the current mongod or mongos process cleanly and safely.
This operation fails when the current database is not the admin database.
This command provides a wrapper around the shutdown.
| Parameters: |
|
|---|---|
| Returns: | A document that contains statistics reflecting the database system’s state. |
This function provides a wrapper around the database command “dbStats”. The scale option allows you to configure how the mongo shell scales the sizes of things in the output. For example, specify a scale value of 1024 to display kilobytes rather than bytes.
See the “Database Statistics Reference” document for an overview of this output.
Note
The scale factor rounds values to whole numbers. This can produce unpredictable and unexpected results in some situations.
| Parameters: |
|
|---|---|
| Returns: | null |
For internal use.
See SERVER-4902 for more information.
This method returns information regarding the state of data in a sharded cluster that is useful when diagnosing underlying issues with a sharded cluster.
For internal and diagnostic use only.
| Returns: | The hostname of the system running the mongo shell process. |
|---|
| Returns: | boolean. |
|---|
Returns “true” if the server is running on a system that is Windows, or “false” if the server is running on a Unix or Linux systems.
Returns an array, containing one document per object in the directory. This function operates in the context of the mongo process. The included fields are:
Returns a string which contains the name of the object.
Returns true or false if the object is a directory.
Returns the size of the object in bytes. This field is only present for files.
| Para string file: | |
|---|---|
| Specify a path and file name containing JavaScript. | |
This native function loads and runs a JavaScript file into the current shell environment. To run JavaScript with the mongo shell, you can either:
Specify files loaded with the load() function in relative terms to the current directory of the mongo shell session. Check the current directory using the “pwd()” function.
Returns a list of the files in the current directory.
This function returns with output relative to the current shell session, and does not impact the server.
| Parameters: |
|
|---|
Creates a directory at the specified path. This command will create the entire path specified, if the enclosing directory or directories do not already exit.
Equivalent to mkdir -p with BSD or GNU utilities.
For the current session, this command permits read operations from non-master (i.e. slave or secondary) instances. Practically, use this method in the following form:
db.getMongo().setSlaveOk()
Indicates that “eventually consistent” read operations are acceptable for the current application. This function provides the same functionality as rs.slaveOk().
See the readPref() method for more fine-grained control over read preference in the mongo shell.
Returns the current directory.
This function returns with output relative to the current shell session, and does not impact the server.
Exits the current shell session.
| Returns: | A random number between 0 and 1. |
|---|
This function provides functionality similar to the Math.rand() function from the standard library.
| Parameters: |
|
|---|---|
| Returns: | boolean. |
Removes the specified file from the local file system.
Specify one of the following forms:
| Parameters: |
|
|---|
Provides a simple method to add a member to an existing replica set. You can specify new hosts in one of two ways:
This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
rs.add() provides a wrapper around some of the functionality of the “replSetReconfig” database command and the corresponding shell helper rs.reconfig(). See the Replica Set Configuration document for full documentation of all replica set configuration options.
Example
To add a mongod accessible on the default port 27017 running on the host mongodb3.example.net, use the following rs.add() invocation:
rs.add('mongodb3.example.net:27017')
If mongodb3.example.net is an arbiter, use the following form:
rs.add('mongodb3.example.net:27017', true)
To add mongodb3.example.net as a secondary-only member of set, use the following form of rs.add():
rs.add( { "host": "mongodbd3.example.net:27017", "priority": 0 } )
See the Replica Set Configuration and Replica Set Administration documents for more information.
| Parameters: |
|
|---|
Adds a new arbiter to an existing replica set.
This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
| Returns: | a document that contains the current replica set configuration object. |
|---|
rs.config() is an alias of rs.conf().
| Parameters: |
|
|---|
Forces the current node to become ineligible to become primary for the period specified.
rs.freeze() provides a wrapper around the database command replSetFreeze.
Returns a basic help text for all of the replication related shell functions.
| Parameters: |
|
|---|
Initiates a replica set. Optionally takes a configuration argument in the form of a document that holds the configuration of a replica set. Consider the following model of the most basic configuration for a 3-member replica set:
{
_id : <setname>,
members : [
{_id : 0, host : <host0>},
{_id : 1, host : <host1>},
{_id : 2, host : <host2>},
]
}
This function provides a wrapper around the “replSetInitiate” database command.
| Parameters: |
|
|---|
Initializes a new replica set configuration. This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
rs.reconfig() provides a wrapper around the “replSetReconfig” database command.
rs.reconfig() overwrites the existing replica set configuration. Retrieve the current configuration object with rs.conf(), modify the configuration as needed and then use rs.reconfig() to submit the modified configuration object.
To reconfigure a replica set, use the following sequence of operations:
conf = rs.conf()
// modify conf to change configuration
rs.reconfig(conf)
If you want to force the reconfiguration if a majority of the set isn’t connected to the current member, or you’re issuing the command against a secondary, use the following form:
conf = rs.conf()
// modify conf to change configuration
rs.reconfig(conf, { force: true } )
Warning
Forcing a rs.reconfig() can lead to rollback situations and other difficult to recover from situations. Exercise caution when using this option.
See also
“Replica Set Configuration” and “Replica Set Administration”.
| Parameters: |
|
|---|
Removes the node described by the hostname parameter from the current replica set. This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates negotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
Note
Before running the rs.remove() operation, you must shut down the replica set member that you’re removing.
Changed in version 2.2: This procedure is no longer required when using rs.remove(), but it remains good practice.
Provides a shorthand for the following operation:
db.getMongo().setSlaveOk()
This allows the current connection to allow read operations to run on secondary nodes. See the readPref() method for more fine-grained control over read preference in the mongo shell.
| Returns: | A document with status information. |
|---|
This output reflects the current status of the replica set, using data derived from the heartbeat packets sent by the other members of the replica set.
This method provides a wrapper around the replSetGetStatus database command.
See also
“Replica Set Status Reference” for documentation of this output.
| Parameters: |
|
|---|---|
| Returns: | disconnects shell. |
Forces the current replica set member to step down as primary and then attempt to avoid election as primary for the designated number of seconds. Produces an error if the current node is not primary.
This function will disconnect the shell briefly and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.
rs.stepDown() provides a wrapper around the database command replSetStepDown.
New in version 2.2.
Provides a wrapper around the replSetSyncFrom, which allows administrators to configure the member of a replica set that the current member will pull data from. Specify the name of the member you want to sync from in the form of [hostname]:[port].
See replSetSyncFrom for more details.
For internal use.
| Parameters: |
|
|---|
Use this method to add a database instance or replica set to a sharded cluster. This method must be run on a mongos instance. The host parameter can be in any of the following forms:
[hostname]
[hostname]:[port]
[set]/[hostname]
[set]/[hostname],[hostname]:port
You can specify shards using the hostname, or a hostname and port combination if the shard is running on a non-standard port.
Warning
Do not use localhost for the hostname unless your configuration server is also running on localhost.
The optimal configuration is to deploy shards across replica sets. To add a shard on a replica set you must specify the name of the replica set and the hostname of at least one member of the replica set. You must specify at least one member of the set, but can specify all members in the set or another subset if desired. sh.addShard() takes the following form:
If you specify additional hostnames, all must be members of the same replica set.
sh.addShard("set-name/seed-hostname")
Example
sh.addShard("repl0/mongodb3.example.net:27327")
The sh.addShard() method is a helper for the addShard command. The addShard command has additional options which are not available with this helper.
New in version 2.2.
| Parameters: |
|
|---|
sh.addShardTag() associates a shard with a tag or identifier. MongoDB can use these identifiers, to “home” or attach (i.e. with sh.addTagRange()) specific data to a specific shard.
Always issue sh.addShardTag() when connected to a mongos instance. The following example adds three tags, LGA, EWR, and JFK, to three shards:
sh.addShardTag("shard0000", "LGA")
sh.addShardTag("shard0001", "EWR")
sh.addShardTag("shard0002", "JFK")
New in version 2.2.
| Parameters: |
|
|---|
sh.addTagRange() attaches a range of values of the shard key to a shard tag created using the sh.addShardTag() helper. Use this operation to ensure that the documents that exist within the specified range exist on shards that have a matching tag.
Always issue sh.addTagRange() when connected to a mongos instance.
| Parameters: |
|
|---|
Enables sharding on the specified database. This does not automatically shard any collections, but makes it possible to begin sharding collections using sh.shardCollection().
| Returns: | boolean. |
|---|
Returns true if the balancer process is currently running and migrating chunks and false if the balancer process is not running. Use sh.getBalancerState() to determine if the balancer is enabled or disabled.
| Parameters: |
|
|---|
Moves the chunk containing the documents specified by the query to the shard described by destination.
This function provides a wrapper around the moveChunk. In most circumstances, allow the balancer to automatically migrate chunks, and avoid calling sh.moveChunk() directly.
New in version 2.2.
| Parameters: |
|
|---|
Removes the association between a tag and a shard. Always issue sh.removeShardTag() when connected to a mongos instance.
| Parameters: |
|
|---|
Enables or disables the balancer. Use sh.getBalancerState() to determine if the balancer is currently enabled or disabled and sh.isBalancerRunning() to check its current state.
| Returns: | boolean. |
|---|
sh.getBalancerState() returns true when the balancer is enabled and false when the balancer is disabled. This does not reflect the current state of balancing operations: use sh.isBalancerRunning() to check the balancer’s current state.
| Parameters: |
|
|---|
Shards the named collection, according to the specified shard key. Specify shard keys in the form of a document. Shard keys may refer to a single document field, or more typically several document fields to form a “compound shard key.”
| Parameters: |
|
|---|
Splits the chunk containing the document specified by the query as if that document were at the “middle” of the collection, even if the specified document is not the actual median of the collection. Use this command to manually split chunks unevenly. Use the “sh.splitFind()” function to split a chunk at the actual median.
In most circumstances, you should leave chunk splitting to the automated processes within MongoDB. However, when initially deploying a sharded cluster it is necessary to perform some measure of pre-splitting using manual methods including sh.splitAt().
| Parameters: |
|
|---|
Splits the chunk containing the document specified by the query at its median point, creating two roughly equal chunks. Use sh.splitAt() to split a collection in a specific point.
In most circumstances, you should leave chunk splitting to the automated processes. However, when initially deploying a sharded cluster it is necessary to perform some measure of pre-splitting using manual methods including sh.splitFind().
| Returns: | a formatted report of the status of the sharded cluster, including data regarding the distribution of chunks. |
|---|
The following documents provide mappings between MongoDB concepts and statements and SQL concepts and statements.
In addition to the charts that follow, you might want to consider the Frequently Asked Questions section for a selection of common questions about MongoDB.
The following table presents the MySQL/Oracle executables and the corresponding MongoDB executables.
| MySQL/Oracle | MongoDB | |
|---|---|---|
| Database Server | mysqld/oracle | mongod |
| Database Client | mysql/sqlplus | mongo |
The following table presents the various SQL terminology and concepts and the corresponding MongoDB terminology and concepts.
| SQL Terms/Concepts | MongoDB Terms/Concepts |
|---|---|
| database | database |
| table | collection |
| row | document or BSON document |
| column | field |
| index | index |
| table joins | embedded documents and linking |
primary key Specify any unique column or column combination as primary key. |
In MongoDB, the primary key is automatically set to the _id field. |
| aggregation (e.g. group by) | aggregation framework |
The following table presents the various SQL statements and the corresponding MongoDB statements. The examples in the table assume the following conditions:
The SQL examples assume a table named users.
The MongoDB examples assume a collection named users that contain documents of the following prototype:
{
_id: ObjectID("509a8fb2f3f4948bd2f983a0"),
user_id: "abc123",
age: 55,
status: 'A'
}
The following table presents the various SQL statements related to table-level actions and the corresponding MongoDB statements.
| SQL Schema Statements | MongoDB Schema Statements | Reference |
|---|---|---|
CREATE TABLE users (
id MEDIUMINT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)
)
|
Implicitly created on first insert operation. The primary key _id is automatically added if _id field is not specified. db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
} )
However, you can also explicitly create a collection: db.createCollection("users")
|
See insert() and createCollection() for more information. |
ALTER TABLE users
ADD join_date DATETIME
|
Collections do not describe or enforce the structure of the constituent documents. See the Schema Design wiki page for more information. | See update() and $set for more information on changing the structure of documents in a collection. |
ALTER TABLE users
DROP COLUMN join_date
|
Collections do not describe or enforce the structure of the constituent documents. See the Schema Design wiki page for more information. | See update() and $set for more information on changing the structure of documents in a collection. |
CREATE INDEX idx_user_id_asc
ON users(user_id)
|
db.users.ensureIndex( { user_id: 1 } )
|
See ensureIndex() and indexes for more information. |
CREATE INDEX
idx_user_id_asc_age_desc
ON users(user_id, age DESC)
|
db.users.ensureIndex( { user_id: 1, age: -1 } )
|
See ensureIndex() and indexes for more information. |
DROP TABLE users
|
db.users.drop()
|
See drop() for more information. |
The following table presents the various SQL statements related to inserting records into tables and the corresponding MongoDB statements.
| SQL INSERT Statements | MongoDB insert() Statements | Reference |
|---|---|---|
INSERT INTO users(user_id,
age,
status)
VALUES ("bcd001",
45,
"A")
|
db.users.insert( {
user_id: "bcd001",
age: 45,
status: "A"
} )
|
See insert() for more information. |
The following table presents the various SQL statements related to reading records from tables and the corresponding MongoDB statements.
| SQL SELECT Statements | MongoDB find() Statements | Reference |
|---|---|---|
SELECT *
FROM users
|
db.users.find()
|
See find() for more information. |
SELECT id, user_id, status
FROM users
|
db.users.find(
{ },
{ user_id: 1, status: 1 }
)
|
See find() for more information. |
SELECT user_id, status
FROM users
|
db.users.find(
{ },
{ user_id: 1, status: 1, _id: 0 }
)
|
See find() for more information. |
SELECT *
FROM users
WHERE status = "A"
|
db.users.find(
{ status: "A" }
)
|
See find() for more information. |
SELECT user_id, status
FROM users
WHERE status = "A"
|
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)
|
See find() for more information. |
SELECT *
FROM users
WHERE status != "A"
|
db.users.find(
{ status: { $ne: "A" } }
)
|
See find() and $ne for more information. |
SELECT *
FROM users
WHERE status = "A"
AND age = 50
|
db.users.find(
{ status: "A",
age: 50 }
)
|
See find() and $and for more information. |
SELECT *
FROM users
WHERE status = "A"
OR age = 50
|
db.users.find(
{ $or: [ { status: "A" } ,
{ age: 50 } ] }
)
|
See find() and $or for more information. |
SELECT *
FROM users
WHERE age > 25
|
db.users.find(
{ age: { $gt: 25 } }
)
|
See find() and $gt for more information. |
SELECT *
FROM users
WHERE age < 25
|
db.users.find(
{ age: { $lt: 25 } }
)
|
See find() and $lt for more information. |
SELECT *
FROM users
WHERE age > 25
AND age <= 50
|
db.users.find(
{ age: { $gt: 25, $lte: 50 } }
)
|
See find(), $gt, and $lte for more information. |
SELECT *
FROM users
WHERE user_id like "%bc%"
|
db.users.find(
{ user_id: /bc/ }
)
|
See find() and $regex for more information. |
SELECT *
FROM users
WHERE user_id like "bc%"
|
db.users.find(
{ user_id: /^bc/ }
)
|
See find() and $regex for more information. |
SELECT *
FROM users
WHERE status = "A"
ORDER BY user_id ASC
|
db.users.find( { status: "A" } ).sort( { user_id: 1 } )
|
See find() and sort() for more information. |
SELECT *
FROM users
WHERE status = "A"
ORDER BY user_id DESC
|
db.users.find( { status: "A" } ).sort( { user_id: -1 } )
|
See find() and sort() for more information. |
SELECT COUNT(*)
FROM users
|
db.users.count()
or db.users.find().count()
|
See find() and count() for more information. |
SELECT COUNT(user_id)
FROM users
|
db.users.count( { user_id: { $exists: true } } )
or db.users.find( { user_id: { $exists: true } } ).count()
|
See find(), count(), and $exists for more information. |
SELECT COUNT(*)
FROM users
WHERE age > 30
|
db.users.count( { age: { $gt: 30 } } )
or db.users.find( { age: { $gt: 30 } } ).count()
|
See find(), count(), and $gt for more information. |
SELECT DISTINCT(status)
FROM users
|
db.users.distinct( "status" )
|
See find() and distinct() for more information. |
SELECT *
FROM users
LIMIT 1
|
db.users.findOne()
or db.users.find().limit(1)
|
See find(), findOne(), and limit() for more information. |
SELECT *
FROM users
LIMIT 5
SKIP 10
|
db.users.find().limit(5).skip(10)
|
See find(), limit(), and skip() for more information. |
EXPLAIN SELECT *
FROM users
WHERE status = "A"
|
db.users.find( { status: "A" } ).explain()
|
See find() and explain() for more information. |
The following table presents the various SQL statements related to updating existing records in tables and the corresponding MongoDB statements.
| SQL Update Statements | MongoDB update() Statements | Reference |
|---|---|---|
UPDATE users
SET status = "C"
WHERE age > 25
|
db.users.update(
{ age: { $gt: 25 } },
{ $set: { status: "C" } },
{ multi: true }
)
|
See update(), $gt, and $set for more information. |
UPDATE users
SET age = age + 3
WHERE status = "A"
|
db.users.update(
{ status: "A" } ,
{ $inc: { age: 3 } },
{ multi: true }
)
|
See update(), $inc, and $set for more information. |
The following table presents the various SQL statements related to deleting records from tables and the corresponding MongoDB statements.
| SQL Delete Statements | MongoDB remove() Statements | Reference |
|---|---|---|
DELETE FROM users
WHERE status = "D"
|
db.users.remove( { status: "D" } )
|
See remove() for more information. |
DELETE FROM users
|
db.users.remove( )
|
See remove() for more information. |
The aggregation framework allows MongoDB to provide native aggregation capabilities that corresponds to many common data aggregation operations in SQL. If you’re new to MongoDB you might want to consider the Frequently Asked Questions section for a selection of common questions.
The following table provides an overview of common SQL aggregation terms, functions, and concepts and the corresponding MongoDB aggregation operators:
| SQL Terms, Functions, and Concepts | MongoDB Aggregation Operators |
|---|---|
| WHERE | $match |
| GROUP BY | $group |
| HAVING | $match |
| SELECT | $project |
| ORDER BY | $sort |
| LIMIT | $limit |
| SUM() | $sum |
| COUNT() | $sum |
| join | No direct corresponding operator; however, the $unwind operator allows for somewhat similar functionality, but with fields embedded within the document. |
The following table presents a quick reference of SQL aggregation statements and the corresponding MongoDB statements. The examples in the table assume the following conditions:
The SQL examples assume two tables, orders and order_lineitem that join by the order_lineitem.order_id and the orders.id columns.
The MongoDB examples assume one collection orders that contain documents of the following prototype:
{
cust_id: "abc123",
ord_date: ISODate("2012-11-02T17:04:11.102Z"),
status: 'A',
price: 50,
items: [ { sku: "xxx", qty: 25, price: 1 },
{ sku: "yyy", qty: 25, price: 1 } ]
}
The MongoDB statements prefix, the names of the fields from the documents in the collection orders with a $ character when they appear as operands to the aggregation operations.
| SQL Example | MongoDB Example | Description |
|---|---|---|
SELECT COUNT(*) AS count
FROM orders
|
db.orders.aggregate( [
{ $group: { _id: null,
count: { $sum: 1 } } }
] )
|
Count all records from orders |
SELECT SUM(price) AS total
FROM orders
|
db.orders.aggregate( [
{ $group: { _id: null,
total: { $sum: "$price" } } }
] )
|
Sum the price field from orders |
SELECT cust_id,
SUM(price) AS total
FROM orders
GROUP BY cust_id
|
db.orders.aggregate( [
{ $group: { _id: "$cust_id",
total: { $sum: "$price" } } }
] )
|
For each unique cust_id, sum the price field. |
SELECT cust_id,
SUM(price) AS total
FROM orders
GROUP BY cust_id
ORDER BY total
|
db.orders.aggregate( [
{ $group: { _id: "$cust_id",
total: { $sum: "$price" } } },
{ $sort: { total: 1 } }
] )
|
For each unique cust_id, sum the price field, results sorted by sum. |
SELECT cust_id,
ord_date,
SUM(price) AS total
FROM orders
GROUP BY cust_id, ord_date
|
db.orders.aggregate( [
{ $group: { _id: { cust_id: "$cust_id",
ord_date: "$ord_date" },
total: { $sum: "$price" } } }
] )
|
For each unique cust_id, ord_date grouping, sum the price field. |
SELECT cust_id, count(*)
FROM orders
GROUP BY cust_id
HAVING count(*) > 1
|
db.orders.aggregate( [
{ $group: { _id: "$cust_id",
count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } }
] )
|
For cust_id with multiple records, return the cust_id and the corresponding record count. |
SELECT cust_id,
ord_date,
SUM(price) AS total
FROM orders
GROUP BY cust_id, ord_date
HAVING total > 250
|
db.orders.aggregate( [
{ $group: { _id: { cust_id: "$cust_id",
ord_date: "$ord_date" },
total: { $sum: "$price" } } },
{ $match: { total: { $gt: 250 } } }
] )
|
For each unique cust_id, ord_date grouping, sum the price field and return only where the sum is greater than 250. |
SELECT cust_id,
SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id
|
db.orders.aggregate( [
{ $match: { status: 'A' } },
{ $group: { _id: "$cust_id",
total: { $sum: "$price" } } }
] )
|
For each unique cust_id with status A, sum the price field. |
SELECT cust_id,
SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id
HAVING total > 250
|
db.orders.aggregate( [
{ $match: { status: 'A' } },
{ $group: { _id: "$cust_id",
total: { $sum: "$price" } } },
{ $match: { total: { $gt: 250 } } }
] )
|
For each unique cust_id with status A, sum the price field and return only where the sum is greater than 250. |
SELECT cust_id,
SUM(li.qty) as qty
FROM orders o,
order_lineitem li
WHERE li.order_id = o.id
GROUP BY cust_id
|
db.orders.aggregate( [
{ $unwind: "$items" },
{ $group: { _id: "$cust_id",
qty: { $sum: "$items.qty" } } }
] )
|
For each unique cust_id, sum the corresponding line item qty fields associated with the orders. |
SELECT COUNT(*)
FROM (SELECT cust_id, ord_date
FROM orders
GROUP BY cust_id, ord_date) as DerivedTable
|
db.orders.aggregate( [
{ $group: { _id: { cust_id: "$cust_id",
ord_date: "$ord_date" } } },
{ $group: { _id: null, count: { $sum: 1 } } }
] )
|
Count the number of distinct cust_id, ord_date groupings. |
For this reference material in another form, consider the following interface overview pages:
The core components in the MongoDB package are: mongod, the core database process; mongos the controller and query router for sharded clusters; and mongo the interactive MongoDB Shell.
mongod is the primary daemon process for the MongoDB system. It handles data requests, manages data format, and performs background management operations.
This document provides a complete overview of all command line options for mongod. These options are primarily useful for testing purposes. In common operation, use the configuration file options to control the behavior of your database, which is fully capable of all operations described below.
Returns a basic help and usage text.
Returns the version of the mongod daemon.
Specifies a configuration file, that you can use to specify runtime-configurations. While the options are equivalent and accessible via the other command line arguments, the configuration file is the preferred method for runtime configuration of mongod. See the “Configuration File Options” document for more information about these options.
Increases the amount of internal reporting returned on standard output or in the log file specified by --logpath. Use the -v form to control the level of verbosity by including the option multiple times, (e.g. -vvvvv.)
Runs the mongod instance in a quiet mode that attempts to limit the amount of output. This option suppresses:
Specifies a TCP port for the mongod to listen for client connections. By default mongod listens for connections on port 27017.
UNIX-like systems require root privileges to use ports with numbers lower than 1000.
The IP address that the mongod process will bind to and listen for connections. By default mongod listens for connections on the localhost (i.e. 127.0.0.1 address.) You may attach mongod to any interface; however, if you attach mongod to a publicly accessible interface ensure that you have implemented proper authentication and/or firewall restrictions to protect the integrity of your database.
Specifies the maximum number of simultaneous connections that mongod will accept. This setting will have no effect if it is higher than your operating system’s configured maximum connection tracking threshold.
Note
You cannot set maxConns to a value higher than 20000.
Forces the mongod to validate all requests from clients upon receipt to ensure that invalid objects are never inserted into the database. Enabling this option will produce some performance impact, and is not enabled by default.
Specify a path for the log file that will hold all diagnostic logging information.
Unless specified, mongod will output all log information to the standard output. Additionally, unless you also specify --logappend, the logfile will be overwritten when the process restarts.
Note
The behavior of the logging system may change in the near future in response to the SERVER-4499 case.
When specified, this option ensures that mongod appends new entries to the end of the logfile rather than overwriting the content of the log when the process restarts.
New in version 2.1.0.
Sends all logging output to the host’s syslog system rather than to standard output or a log file as with --logpath.
Specify a file location to hold the “PID” or process ID of the mongod process. Useful for tracking the mongod process in combination with the mongod --fork option.
If this option is not set, mongod will create no PID file.
Specify the path to a key file to store authentication information. This option is only useful for the connection between replica set members.
See also
Disables listening on the UNIX socket. Unless set to false, mongod and mongos provide a UNIX-socket.
Specifies a path for the UNIX socket. Unless this option has a value, mongod and mongos, create a socket with the /tmp as a prefix.
Enables a daemon mode for mongod that runs the process to the background. This is the normal mode of operation, in production and production-like environments, but may not be desirable for testing.
Enables database authentication for users connecting from remote hosts. configure users via the mongo shell shell. If no users exist, the localhost interface will continue to have access to the database until the you create the first user.
See the “Security and Authentication wiki page for more information regarding this functionality.
Forces mongod to report the percentage of CPU time in write lock. mongod generates output every four seconds. MongoDB writes this data to standard output or the logfile if using the logpath option.
Specify a directory for the mongod instance to store its data. Typical locations include: /srv/mongodb, /var/lib/mongodb or /opt/mongodb
Unless specified, mongod will look for data files in the default /data/db directory. (Windows systems use the \data\db directory.) If you installed using a package management system. Check the /etc/mongodb.conf file provided by your packages to see the configuration of the dbpath.
Creates a very verbose, diagnostic log for troubleshooting and recording various errors. MongoDB writes these log files in the dbpath directory in a series of files that begin with the string diaglog and end with the initiation time of the logging as a hex string.
The specified value configures the level of verbosity. Possible values, and their impact are as follows.
| Value | Setting |
| 0 | off. No logging. |
| 1 | Log write operations. |
| 2 | Log read operations. |
| 3 | Log both read and write operations. |
| 7 | Log write and some read operations. |
You can use the mongosniff tool to replay this output for investigation. Given a typical diaglog file, located at /data/db/diaglog.4f76a58c, you might use a command in the following form to read these files:
mongosniff --source DIAGLOG /data/db/diaglog.4f76a58c
--diaglog is for internal use and not intended for most users.
Warning
Setting the diagnostic level to 0 will cause mongod to stop writing data to the diagnostic log file. However, the mongod instance will continue to keep the file open, even if it is no longer writing data to the file. If you want to rename, move, or delete the diagnostic log you must cleanly shut down the mongod instance before doing so.
Alters the storage pattern of the data directory to store each database’s files in a distinct folder. This option will create directories within the --dbpath named for each directory.
Use this option in conjunction with your file system and device configuration so that MongoDB will store data on a number of distinct disk devices to increase write throughput or disk capacity.
Enables operation journaling to ensure write durability and data consistency. mongod enables journaling by default on 64-bit builds of versions after 2.0.
Provides functionality for testing. Not for general use, and may affect database integrity.
Specifies the maximum amount of time for mongod to allow between journal operations. The default value is 100 milliseconds, while possible values range from 2 to 300 milliseconds. Lower values increase the durability of the journal, at the expense of disk performance.
To force mongod to commit to the journal more frequently, you can specify j:true. When a write operation with j:true pending, mongod will reduce journalCommitInterval to a third of the set value.
Specify this option to enable IPv6 support. This will allow clients to connect to mongod using IPv6 networks. mongod disables IPv6 support by default in mongod and all utilities.
Permits JSONP access via an HTTP interface. Consider the security implications of allowing this activity before enabling this option.
Disable authentication. Currently the default. Exists for future compatibility and clarity.
Disables the HTTP interface.
Disables the durability journaling. By default, mongod enables journaling in 64-bit versions after v2.0.
Disables the preallocation of data files. This will shorten the start up time in some cases, but can cause significant performance penalties during normal operations.
Disables the scripting engine.
Forbids operations that require a table scan.
Specifies the default size for namespace files (i.e .ns). This option has no impact on the size of existing namespace files. The maximum size is 2047 megabytes.
The default value is 16 megabytes; this provides for approximately 24,000 namespaces. Each collection, as well as each index, counts as a namespace.
Changes the level of database profiling, which inserts information about operation performance into output of mongod or the log file. The following levels are available:
| Level | Setting |
| 0 | Off. No profiling. |
| 1 | On. Only includes slow operations. |
| 2 | On. Includes all operations. |
Profiling is off by default. Database profiling can impact database performance. Enable this option only after careful consideration.
Enables a maximum limit for the number data files each database can have. When running with --quota, there are a maximum of 8 data files per database. Adjust the quota with the --quotaFiles option.
Modify limit on the number of data files per database. This option requires the --quota setting. The default value for --quotaFiles is 8.
Runs a repair routine on all databases. This is equivalent to shutting down and running the repairDatabase database command on all databases.
Warning
In general, if you have an intact copy of your data, such as would exist on a very recent backup or an intact member of a replica set, do not use repairDatabase or related options like db.repairDatabase() in the mongo shell or mongod --repair. Restore from an intact copy of your data.
Note
When using journaling, there is almost never any need to run repairDatabase. In the event of an unclean shutdown, the server will be able restore the data files to a pristine state automatically.
Changed in version 2.1.2.
If you run the repair option and have data in a journal file, mongod will refuse to start. In these cases you should start mongod without the --repair option to allow mongod to recover data from the journal. This will complete more quickly and will result in a more consistent and complete data set.
To continue the repair operation despite the journal files, shut down mongod cleanly and restart with the --repair option.
Specifies the root directory containing MongoDB data files, to use for the --repair operation. Defaults to the value specified by --dbpath.
Defines the value of “slow,” for the --profile option. The database logs all slow queries to the log, even when the profiler is not turned on. When the database profiler is on, mongod the profiler writes to the system.profile collection. See the profile command for more information on the database profiler.
Enables a mode where MongoDB uses a smaller default file size. Specifically, --smallfiles reduces the initial size for data files and limits them to 512 megabytes. --smallfiles also reduces the size of each journal files from 1 gigabyte to 128 megabytes.
Use --smallfiles if you have a large number of databases that each holds a small quantity of data. --smallfiles can lead your mongod to create a large number of files, which may affect performance for larger databases.
Used in control scripts, the --shutdown will cleanly and safely terminate the mongod process. When invoking mongod with this option you must set the --dbpath option either directly or by way of the configuration file and the --config option.
mongod writes data very quickly to the journal, and lazily to the data files. --syncdelay controls how much time can pass before MongoDB flushes data to the datafiles via an fsync operation. The default setting is 60 seconds. We recommend almost always using the default setting of 60.
The serverStatus command reports the background flush thread’s status via the backgroundFlushing field.
Note
If --syncdelay is 0, mongod flushes all operations to disk immediately, which has a significant impact on performance. Run with journal enabled, which is the default for 64-bit MongoDB builds.
Returns diagnostic system information and then exits. The information provides the page size, the number of physical pages, and the number of available physical pages.
Upgrades the on-disk data format of the files specified by the --dbpath to the latest version, if needed.
This option only affects the operation of mongod if the data files are in an old format.
Note
In most cases you should not set this value, so you can exercise the most control over your upgrade process. See the MongoDB release notes (on the download page) for more information about the upgrade process.
For internal diagnostic use only.
Use this option to configure replication with replica sets. Specify a setname as an argument to this set. All hosts must have the same set name.
See also
“Replication,” “Replica Set Administration,” and “Replica Set Configuration“
Specifies a maximum size in megabytes for the replication operation log (e.g. oplog.) By mongod creates an oplog based on the maximum amount of space available. For 64-bit systems, the op log is typically 5% of available disk space.
Once the mongod has created the oplog for the first time, changing --oplogSize will not affect the size of the oplog.
In the context of replica set replication, set this option if you have seeded this replica with a snapshot of the dbpath of another member of the set. Otherwise the mongod will attempt to perform a full sync.
Warning
If the data is not perfectly synchronized and mongod starts with fastsync, then the secondary or slave will be permanently out of sync with the primary, which may cause significant consistency problems.
New in version 2.2.
You must use --replIndexPrefetch in conjunction with replSet. The default value is all and available options are:
By default secondary members of a replica set will load all indexes related to an operation into memory before applying operations from the oplog. You can modify this behavior so that the secondaries will only load the _id index. Specify _id_only or none to prevent the mongod from loading any index into memory.
These options provide access to conventional master-slave database replication. While this functionality remains accessible in MongoDB, replica sets are the preferred configuration for database replication.
For use with the --slave option, the --source option designates the server that this instance will replicate.
For use with the --slave option, the --only option specifies only a single database to replicate.
For use with the --slave option, the --slavedelay option configures a “delay” in seconds, for this slave to wait to apply operations from the master node.
For use with the --slave option, the --autoresync option allows this slave to automatically resync if the local data is more than 10 seconds behind the master. This option may be problematic if the oplog is too small (controlled by the --oplogSize option.) If the oplog not large enough to store the difference in changes between the master’s current state and the state of the slave, this node will forcibly resync itself unnecessarily. When you set the If the --autoresync option the slave will not attempt an automatic resync more than once in a ten minute period.
Declares that this mongod instance serves as the config database of a sharded cluster. When running with this option, clients will not be able to write data to any database other than config and admin. The default port for mongod with this option is 27019 and mongod writes all data files to the /configdb sub-directory of the --dbpath directory.
Configures this mongod instance as a shard in a partitioned cluster. The default port for these instances is 27018. The only effect of --shardsvr is to change the port number.
Disables a “paranoid mode” for data writes for chunk migration operation. See the chunk migration and moveChunk command documentation for more information.
By default mongod will save copies of migrated chunks on the “from” server during migrations as “paranoid mode.” Setting this option disables this paranoia.
In common usage, the invocation of mongod will resemble the following in the context of an initialization or control script:
mongod --config /etc/mongodb.conf
See the “Configuration File Options” for more information on how to configure mongod using the configuration file.
mongos for “MongoDB Shard,” is a routing service for MongoDB shard configurations that processes queries from the application layer, and determines the location of this data in the sharded cluster, in order to complete these operations. From the perspective of the application, a mongos instance behaves identically to any other MongoDB instance.
See also
See the “Sharding” wiki page for more information regarding MongoDB’s sharding functionality.
Note
Changed in version 2.1.
Some aggregation operations using the aggregate will cause mongos instances to require more CPU resources than in previous versions. This modified performance profile may dictate alternate architecture decisions if you make use the aggregation framework extensively in a sharded environment.
Returns a basic help and usage text.
Returns the version of the mongod daemon.
Specifies a configuration file, that you can use to specify runtime-configurations. While the options are equivalent and accessible via the other command line arguments, the configuration file is the preferred method for runtime configuration of mongod. See the “Configuration File Options” document for more information about these options.
Not all configuration options for mongod make sense in the context of mongos.
Increases the amount of internal reporting returned on standard output or in the log file specified by --logpath. Use the -v form to control the level of verbosity by including the option multiple times, (e.g. -vvvvv.)
Specifies a TCP port for the mongos to listen for client connections. By default mongos listens for connections on port 27017.
UNIX-like systems require root access to access ports with numbers lower than 1000.
The IP address that the mongos process will bind to and listen for connections. By default mongos listens for connections on the localhost (i.e. 127.0.0.1 address.) You may attach mongos to any interface; however, if you attach mongos to a publicly accessible interface you must implement proper authentication or firewall restrictions to protect the integrity of your database.
Specifies the maximum number of simultaneous connections that mongos will accept. This setting will have no effect if the value of this setting is higher than your operating system’s configured maximum connection tracking threshold.
This is particularly useful for mongos if you have a client that creates a number of collections but allows them to timeout rather than close the collections. When you set maxConns, ensure the value is slightly higher than the size of the connection pool or the total number of connections to prevent erroneous connection spikes from propagating to the members of a shard cluster.
Note
You cannot set maxConns to a value higher than 20000.
Forces the mongos to validate all requests from clients upon receipt to ensure that invalid objects are never inserted into the database. This option has a performance impact, and is not enabled by default.
Specify a path for the log file that will hold all diagnostic logging information.
Unless specified, mongos will output all log information to the standard output. Additionally, unless you also specify --logappend, the logfile will be overwritten when the process restarts.
Specify to ensure that mongos appends additional logging data to the end of the logfile rather than overwriting the content of the log when the process restarts.
New in version 2.1.0.
Sends all logging output to the host’s syslog system rather than to standard output or a log file as with --logpath.
Specify a file location to hold the “PID” or process ID of the mongod process. Useful for tracking the mongod process in combination with the mongos --fork option.
Without this option, mongos will create a PID file.
Specify the path to a key file to store authentication information. This option is only useful for the connection between mongos instances and components of the sharded cluster.
See also
Disables listening on the UNIX socket. Without this option mongos creates a UNIX socket.
Specifies a path for the UNIX socket. Unless specified, mongos creates a socket in the /tmp path.
Enables a daemon mode for mongod which forces the process to the background. This is the normal mode of operation, in production and production-like environments, but may not be desirable for testing.
Set this option to specify a configuration database (i.e. config database) for the sharded cluster. You must specify either 1 configuration server or 3 configuration servers, in a comma separated list.
Note
mongos instances read from the first config server in the list provided. All mongos instances must specify the hosts to the --configdb setting in the same order.
If your configuration databases reside in more that one data center, order the hosts in the --configdb argument so that the config database that is closest to the majority of your mongos instances is first servers in the list.
Warning
Never remove a config server from the --configdb parameter, even if the config server or servers are not available, or offline.
This option is for internal testing use only, and runs unit tests without starting a mongos instance.
This option updates the meta data format used by the config database.
The value of the --chunkSize determines the size of each chunk of data distributed around thee sharded cluster. The default value is 64 megabytes, which is the ideal size for chunks in most deployments: larger chunk size can lead to uneven data distribution, smaller chunk size often leads to inefficient movement of chunks between nodes. However, in some circumstances it may be necessary to set a different chunk size.
This option only sets the chunk size when initializing the cluster for the first time. If you modify the run-time option later, the new value will have no effect. See the “Modify Chunk Size” procedure if you need to change the chunk size on an existing sharded cluster.
Enables IPv6 support to allow clients to connect to mongos using IPv6 networks. MongoDB disables IPv6 support by default in mongod and all utilities.
Permits JSONP access via an HTTP interface. Consider the security implications of allowing this activity before enabling this option.
Disables the scripting engine.
New in version 2.1.2.
Disables the HTTP interface.
New in version 2.2.
--localThreshold affects the logic that program:mongos uses when selecting replica set members to pass reads operations to from clients. Specify a value to --localThreshold in milliseconds. The default value is 15, which corresponds to the default value in all of the client drivers.
When mongos receives a request that permits reads to secondary members, the mongos will:
find the member of the set with the lowest ping time.
construct a list of replica set members that is within a ping time of 15 milliseconds of the nearest suitable member of the set.
If you specify a value for --localThreshold, mongos will construct the list of replica members that are within the latency allowed by this value.
The mongos will select a member to read from at random from this list.
The ping time used for a set member compared by the --localThreshold setting is a moving average of recent ping times, calculated, at most, every 10 seconds. As a result, some queries may reach members above the threshold until the mongos recalculates the average.
See the Member Selection section of the read preference documentation for more information.
mongo is an interactive JavaScript shell interface to MongoDB. The mongo command provides a powerful interface for systems administrators as well as a way to test queries and operations directly with the database. To increase the flexibility of the mongo command, the shell provides a fully functional JavaScript environment. This document addresses the basic invocation of the mongo shell and an overview of its usage.
Enables the shell interface after evaluating a JavaScript file. If you invoke the mongo command and specify a JavaScript file as an argument, or use mongo --eval to specify JavaScript on the command line, the mongo --shell option provides the user with a shell prompt after the file finishes executing.
Prevents the shell from connecting to any database instances.
Prevents the shell from sourcing and evaluating ~/.mongorc.js on startup.
Silences output from the shell during the connection process.
Specifies the port where the mongod or mongos instance is listening. Unless specified mongo connects to mongod instances on port 27017, which is the default mongod port.
specifies the host where the mongod or mongos is running to connect to as <HOSTNAME>. By default mongo will attempt to connect to a MongoDB process running on the localhost.
Evaluates a JavaScript expression specified as an argument to this option. mongo does not load its own environment when evaluating code: as a result many options of the shell environment are not available.
Specifies a username to authenticate to the MongoDB instance. Use in conjunction with the mongo --password option to supply a password. If you specify a username and password but the default database or the specified database do not require authentication, mongo will exit with an exception.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the mongo --username option to supply a username. If you specify a --username without the mongo --password option, mongo will prompt for a password interactively, if the mongod or mongos requires authentication.
Returns a basic help and usage text.
Returns the version of the shell.
Increases the verbosity of the output of the shell during the connection process.
Enables IPv6 support that allows mongo to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongo, disable IPv6 support by default.
Specifies the “database address” of the database to connect to. For example:
mongo admin
The above command will connect the mongo shell to the admin database on the local machine. You may specify a remote database instance, with the resolvable hostname or IP address. Separate the database name from the hostname using a / character. See the following examples:
mongo mongodb1.example.net
mongo mongodb1/admin
mongo 10.8.8.10/test
Specifies a JavaScript file to run and then exit. Must be the last option specified. Use the mongo --shell option to return to a shell after the file finishes running.
~/.dbshell
mongo maintains a history of commands in the .dbshell file.
Note
Interaction related to authentication, including authenticate and db.addUser() are not saved in the history file.
Warning
Versions of Windows mongo.exe earlier than 2.2.0 will save the .dbshell file in the mongo.exe working directory.
~/.mongorc.js
mongo will read .mongorc.js from the home directory of the user invoking mongo. Specify the mongo --norc option to disable reading .mongorc.js.
/tmp/mongo_edit<time_t>.js
Created by mongo when editing a file. If the file exists mongo will append an integer from 1 to 10 to the time value to attempt to create a unique file.
%TEMP%mongo_edit<time_t>.js
Created by mongo.exe on Windows when editing a file. If the file exists mongo will append an integer from 1 to 10 to the time value to attempt to create a unique file.
Specifies the path to an editor to use with the edit shell command. A JavaScript variable EDITOR will override the value of EDITOR.
Specifies the path to the home directory where mongo mongo will read the .mongorc.js file and write the .dbshell file.
On Windows systems, HOMEDRIVE specifies the path the directory where mongo will read the .mongorc.js file and write the .dbshell file.
Specifies the Windows path to the home directory where mongo will read the .mongorc.js file and write the .dbshell file.
Typically users invoke the shell with the mongo command at the system prompt. Consider the following examples for other scenarios.
To connect to a database on a remote host using authentication and a non-standard port, use the following form:
mongo --username <user> --password <pass> --hostname <host> --port 28015
Alternatively, consider the following short form:
mongo -u <user> -p <pass> --host <host> --port 28015
Replace <user>, <pass>, and <host> with the appropriate values for your situation and substitute or omit the --port as needed.
To execute a JavaScript file without evaluating the ~/.mongorc.js file before starting a shell session, use the following form:
mongo --shell --norc alternate-environment.js
To print return a query as JSON, from the system prompt using the --eval option, use the following form:
mongo --eval 'db.collection.find().forEach(printjson)'
Use single quotes (e.g. ') to enclose the JavaScript, as well as the additional JavaScript required to generate this output.
The mongod.exe and mongos.exe describe the options available for configuring MongoDB when running as a Windows Service. The mongod.exe and mongos.exe binaries provide a superset of the mongod and mongos options.
mongod.exe is the build of the MongoDB daemon (i.e. mongod) for the Windows platform. mongod.exe has all of the features of mongod on Unix-like platforms and is completely compatible with the other builds of mongod. In addition, mongod.exe provides several options for interacting with the Windows platform itself.
This document only references options that are unique to mongod.exe. All mongod options are available. See the “mongod” and the “Configuration File Options” documents for more information regarding mongod.exe.
To install and use mongod.exe, read the “Install MongoDB on Windows” document.
Installs mongod.exe as a Windows Service and exits.
Removes the mongod.exe Windows Service. If mongod.exe is running, this operation will stop and then remove the service.
Note
--remove requires the --serviceName if you configured a non-default --serviceName during the --install operation.
Removes mongod.exe and reinstalls mongod.exe as a Windows Service.
Default: “MongoDB”
Set the service name of mongod.exe when running as a Windows Service. Use this name with the net start <name> and net stop <name> operations.
You must use --serviceName in conjunction with either the --install or --remove install option.
Default: “Mongo DB”
Sets the name listed for MongoDB on the Services administrative application.
Default: “MongoDB Server”
Sets the mongod.exe service description.
You must use --serviceDescription in conjunction with the --install option.
Note
For descriptions that contain spaces, you must enclose the description in quotes.
Runs the mongod.exe service in the context of a certain user. This user must have “Log on as a service” privileges.
You must use --serviceUser in conjunction with the --install option.
Sets the password for <user> for mongod.exe when running with the --serviceUser option.
You must use --servicePassword in conjunction with the --install option.
mongos.exe is the build of the MongoDB Shard (i.e. mongos) for the Windows platform. mongos.exe has all of the features of mongos on Unix-like platforms and is completely compatible with the other builds of mongos. In addition, mongos.exe provides several options for interacting with the Windows platform itself.
This document only references options that are unique to mongos.exe. All mongos options are available. See the “mongos” and the “Configuration File Options” documents for more information regarding mongos.exe.
To install and use mongos.exe, read the “Install MongoDB on Windows” document.
Installs mongos.exe as a Windows Service and exits.
Removes the mongos.exe Windows Service. If mongos.exe is running, this operation will stop and then remove the service.
Note
--remove requires the --serviceName if you configured a non-default --serviceName during the --install operation.
Removes mongos.exe and reinstalls mongos.exe as a Windows Service.
Default: “MongoS”
Set the service name of mongos.exe when running as a Windows Service. Use this name with the net start <name> and net stop <name> operations.
You must use --serviceName in conjunction with either the --install or --remove install option.
Default: “Mongo DB Router”
Sets the name listed for MongoDB on the Services administrative application.
Default: “Mongo DB Sharding Router”
Sets the mongos.exe service description.
You must use --serviceDescription in conjunction with the --install option.
Note
For descriptions that contain spaces, you must enclose the description in quotes.
Runs the mongos.exe service in the context of a certain user. This user must have “Log on as a service” privileges.
You must use --serviceUser in conjunction with the --install option.
Sets the password for <user> for mongos.exe when running with the --serviceUser option.
You must use --servicePassword in conjunction with the --install option.
mongodump provides a method for creating BSON dump files from the mongod instances, while mongorestore makes it possible to restore these dumps. bsondump converts BSON dump files into JSON. The mongooplog utility provides the ability to stream oplog entries outside of normal replication.
mongodump is a utility for creating a binary export of the contents of a database. Consider using this utility as part an effective backup strategy. Use in conjunction with mongorestore to provide restore functionality.
Note
The format of data created by mongodump tool from the 2.2 distribution or later is different and incompatible with earlier versions of mongod.
See also
“mongorestore” and “Backup and Restoration Strategies”.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the mongodump utility and exits.
Specifies a resolvable hostname for the mongod that you wish to use to create the database dump. By default mongodump will attempt to connect to a MongoDB process ruining on the localhost port number 27017.
Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017.
To connect to a replica set, use the --host argument with a setname, followed by a slash and a comma-separated list of host names and port numbers. The mongodump utility will, given the seed of at least one connected set member, connect to the primary member of that set. This option would resemble:
mongodump --host repl0/mongo0.example.net,mongo0.example.net:27018,mongo1.example.net,mongo2.example.net
You can always connect directly to a single MongoDB instance by specifying the host and port number directly.
Specifies the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the --host option.
Enables IPv6 support that allows mongodump to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongodump, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the --password option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the --username option to supply a username.
If you specify a --username without the --password option, mongodump will prompt for a password interactively.
Specifies the directory of the MongoDB data files. If used, the --dbpath option enables mongodump to attach directly to local data files and copy the data without the mongod. To run with --dbpath, mongodump needs to restrict access to the data directory: as a result, no mongod can access the same path while the process runs.
Use the --directoryperdb in conjunction with the corresponding option to mongod. This option allows mongodump to read data files organized with each database located in a distinct directory. This option is only relevant when specifying the --dbpath option.
Allows mongodump operations to use the durability journal to ensure that the export is in a consistent state. This option is only relevant when specifying the --dbpath option.
Use the --db option to specify a database for mongodump to backup. If you do not specify a DB, mongodump copies all databases in this instance into the dump files. Use this option to backup or copy a smaller subset of your data.
Use the --collection option to specify a collection for mongodump to backup. If you do not specify a collection, this option copies all collections in the specified database or instance to the dump files. Use this option to backup or copy a smaller subset of your data.
Specifies a path where mongodump and store the output the database dump. To output the database dump to standard output, specify a - rather than a path.
Provides a query to limit (optionally) the documents included in the output of mongodump.
Use this option to ensure that mongodump creates a dump of the database that includes an oplog, to create a point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.
--oplog has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use --oplog to dump individual shards.
Use this option to run a repair option in addition to dumping the database. The repair option attempts to repair a database that may be in an inconsistent state as a result of an improper shutdown or mongod crash.
Forces mongodump to scan the data store directly: typically, mongodump saves entries as they appear in the index of the _id field. Use --forceTableScan to skip the index and scan the data directly. Typically there are two cases where this behavior is preferable to the default:
When you run with --forceTableScan, mongodump does not use $snapshot. As a result, the dump produced by mongodump can reflect the state of the database at many different points in time.
Warning
Use --forceTableScan with extreme caution and consideration.
Warning
Changed in version 2.2: When used in combination with fsync or db.fsyncLock(), mongod may block some reads, including those from mongodump, when queued write operation waits behind the fsync lock.
When running mongodump against a mongos instance where the sharded cluster consists of replica sets, the read preference of the operation will prefer reads from secondary members of the set.
See the “backup guide section on database dumps” for a larger overview of mongodump usage. Also see the “mongorestore” document for an overview of the mongorestore, which provides the related inverse functionality.
The following command, creates a dump file that contains only the collection named collection in the database named test. In this case the database is running on the local interface on port 27017:
mongodump --collection collection --db test
In the next example, mongodump creates a backup of the database instance stored in the /srv/mongodb directory on the local machine. This requires that no mongod instance is using the /srv/mongodb directory.
mongodump --dbpath /srv/mongodb
In the final example, mongodump creates a database dump located at /opt/backup/mongodump-2011-10-24, from a database running on port 37017 on the host mongodb1.example.net and authenticating using the username user and the password pass, as follows:
mongodump --host mongodb1.example.net --port 37017 --username user --password pass /opt/backup/mongodump-2011-10-24
The mongorestore tool imports content from binary database dump, created by mongodump into a specific database. mongorestore can import content to an existing database or create a new one.
mongorestore, and only performs inserts into the existing database, and does not perform updates or upserts. If existing data with the same _id already exists on the target database, mongorestore will not replace it.
mongorestore will recreate indexes from the dump
The behavior of mongorestore has the following properties:
all operations are inserts, not updates.
all inserts are “fire and forget,” mongorestore does not wait for a response from a mongod to ensure that the MongoDB process has received or recorded the operation.
The mongod will record any errors to its log that occur during a restore operation but mongorestore will not receive errors.
Note
The format of data created by mongodump tool from the 2.2 distribution or later is different and incompatible with earlier versions of mongod.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the mongorestore tool.
Specifies a resolvable hostname for the mongod to which you want to restore the database. By default mongorestore will attempt to connect to a MongoDB process running on the localhost port number 27017.
Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017.
To connect to a replica set, you can specify the replica set seed name, and a seed list of set members, in the following format:
<replica_set_name>/<hostname1><:port>,<hostname2:<port>,...
Specifies the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the --host command.
Enables IPv6 support that allows mongorestore to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongorestore, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the --password option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the mongorestore --username option to supply a username.
If you specify a --username without the --password option, mongorestore will prompt for a password interactively.
Specifies the directory of the MongoDB data files. If used, the --dbpath option enables mongorestore to attach directly to local data files and insert the data without the mongod. To run with --dbpath, mongorestore needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs.
Use the --directoryperdb in conjunction with the corresponding option to mongod, which allows mongorestore to import data into MongoDB instances that have every database’s files saved in discrete directories on the disk. This option is only relevant when specifying the --dbpath option.
Allows mongorestore write to the durability journal to ensure that the data files will remain in a consistent state during the write process. This option is only relevant when specifying the --dbpath option.
Use the --db option to specify a database for mongorestore to restore data into. If the database doesn’t exist, mongorestore will create the specified database. If you do not specify a <db>, mongorestore creates new databases that correspond to the databases where data originated and data may be overwritten. Use this option to restore data into a MongoDB instance that already has data.
--db does not control which BSON files mongorestore restores. You must use the mongorestore path option to limit that restored data.
Use the --collection option to specify a collection for mongorestore to restore. If you do not specify a <collection>, mongorestore imports all collections created. Existing data may be overwritten. Use this option to restore data into a MongoDB instance that already has data, or to restore only some data in the specified imported data set.
Verifies each object as a valid BSON object before inserting it into the target database. If the object is not a valid BSON object, mongorestore will not insert the object into the target database and stop processing remaining documents for import. This option has some performance impact.
Limits the documents that mongorestore imports to only those documents that match the JSON document specified as '<JSON>'. Be sure to include the document in single quotes to avoid interaction with your system’s shell environment.
Modifies the restoration procedure to drop every collection from the target database before restoring the collection from the dumped backup.
Replays the oplog after restoring the dump to ensure that the current state of the database reflects the point-in-time backup captured with the “mongodump --oplog” command.
Prevents mongorestore from upgrading the index to the latest version during the restoration process.
New in version 2.2.
Specifies the write concern for each write operation that mongorestore writes to the target database. By default, mongorestore does not wait for a response for write acknowledgment.
New in version 2.2.
Prevents mongorestore from setting the collection options, such as those specified by the collMod database command, on restored collections.
New in version 2.2.
Prevents mongorestore from restoring and building indexes as specified in the corresponding mongodump output.
New in version 2.2.
Prevents mongorestore from applying oplog entries newer than the <timestamp>. Specify <timestamp> values in the form of <time_t>:<ordinal>, where <time_t> is the seconds since the UNIX epoch, and <ordinal> represents a counter of operations in the oplog that occurred in the specified second.
You must use --oplogLimit in conjunction with the --oplogReplay option.
The final argument of the mongorestore command is a directory path. This argument specifies the location of the database dump from which to restore.
See the “backup guide section on database dumps” for a larger overview of mongorestore usage. Also see the “mongodump” document for an overview of the mongodump, which provides the related inverse functionality.
Consider the following example:
mongorestore --collection people --db accounts dump/accounts/
Here, mongorestore reads the database dump in the dump/ sub-directory of the current directory, and restores only the documents in the collection named people from the database named accounts. mongorestore restores data to the instance running on the localhost interface on port 27017.
In the next example, mongorestore restores a backup of the database instance located in dump to a database instance stored in the /srv/mongodb on the local machine. This requires that there are no active mongod instances attached to /srv/mongodb data directory.
mongorestore --dbpath /srv/mongodb
In the final example, mongorestore restores a database dump located at /opt/backup/mongodump-2011-10-24, from a database running on port 37017 on the host mongodb1.example.net. mongorestore` authenticates to the this MongoDB instance using the username user and the password pass, as follows:
mongorestore --host mongodb1.example.net --port 37017 --username user --password pass /opt/backup/mongodump-2011-10-24
The bsondump converts BSON files into human-readable formats, including JSON. For example, bsondump is useful for reading the output files generated by mongodump.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the bsondump utility.
Validates each BSON object before outputting it in JSON format. Use this option to filter corrupt objects from the output. This option has some performance impact.
Limits the documents that bsondump exports to only those documents that match the JSON document specified as '<JSON>'. Be sure to include the document in single quotes to avoid interaction with your system’s shell environment.
By default, bsondump outputs data to standard output. To create corresponding JSON files, you will need to use the shell redirect. See the following command:
bsondump collection.bson > collection.json
Use the following command (at the system shell) to produce debugging output for a BSON file:
bsondump --type=debug collection.bson
New in version 2.1.1.
mongooplog is a simple tool that polls operations from the replication oplog of a remote server, and applies them to the local server. This capability supports certain classes of real-time migrations that require that the source server remain online and in operation throughout the migration process.
Typically this command will take the following form:
mongooplog --from mongodb0.example.net --host mongodb1.example.net
This command copies oplog entries from the mongod instance running on the host mongodb0.example.net and duplicates operations to the host mongodb1.example.net. If you do not need to keep the --from host running during the migration, consider using mongodump and mongorestore or another backup operation, which may be better suited to your operation.
Note
If the mongod instance specified by the --from argument is running with authentication, then mongooplog will not be able to copy oplog entries.
See also
mongodump, mongorestore, “Backup and Restoration Strategies,” “Oplog Internals Overview, and “Replica Set Oplog Sizing”.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the mongooplog utility.
Specifies a resolvable hostname for the mongod instance to which mongooplog will apply oplog operations retrieved from the serve specified by the --from option.
mongooplog assumes that all target mongod instances are accessible by way of port 27017. You may, optionally, declare an alternate port number as part of the hostname argument.
You can always connect directly to a single mongod instance by specifying the host and port number directly.
To connect to a replica set, you can specify the replica set seed name, and a seed list of set members, in the following format:
<replica_set_name>/<hostname1><:port>,<hostname2:<port>,...
Specifies the port number of the mongod instance where mongooplog will apply oplog entries. Only specify this option if the MongoDB instance that you wish to connect to is not running on the standard port. (i.e. 27017) You may also specify a port number using the --host command.
Enables IPv6 support that allows mongooplog to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongooplog, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the --password option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the --username option to supply a username.
If you specify a --username without the --password option, mongooplog will prompt for a password interactively.
Specifies a directory, containing MongoDB data files, to which mongooplog will apply operations from the oplog of the database specified with the --from option. When used, the --dbpath option enables mongo to attach directly to local data files and write data without a running mongod instance. To run with --dbpath, mongooplog needs to restrict access to the data directory: as a result, no mongod can be access the same path while the process runs.
Use the --directoryperdb in conjunction with the corresponding option to mongod. This option allows mongooplog to write to data files organized with each database located in a distinct directory. This option is only relevant when specifying the --dbpath option.
Allows mongooplog operations to use the durability journal to ensure that the data files will remain in a consistent state during the writing process. This option is only relevant when specifying the --dbpath option.
Specify a field or number fields to constrain which data mongooplog will migrate. All other fields will be excluded from the migration. Comma separate a list of fields to limit the applied fields.
As an alternative to “--fields” the --fieldFile option allows you to specify a file (e.g. <file>) that holds a list of field names to include in the migration. All other fields will be excluded from the migration. Place one field per line.
Specify a number of seconds of operations for mongooplog to pull from the remote host. Unless specified the default value is 86400 seconds, or 24 hours.
Specify the host for mongooplog to retrieve oplog operations from. mongooplog requires this option.
Unless you specify the --host option, mongooplog will apply the operations collected with this option to the oplog of the mongod instance running on the localhost interface connected to port 27017.
Specify a namespace in the --from host where the oplog resides. The default value is local.oplog.rs, which is the where replica set members store their operation log. However, if you’ve copied oplog entries into another database or collection, use this option to copy oplog entries stored in another location.
Namespaces take the form of [database].[collection].
Consider the following prototype mongooplog command:
mongooplog --from mongodb0.example.net --host mongodb1.example.net
Here, entries from the oplog of the mongod running on port 27017. This only pull entries from the last 24 hours.
In the next command, the parameters limit this operation to only apply operations to the database people in the collection usage on the target host (i.e. mongodb1.example.net):
mongooplog --from mongodb0.example.net --host mongodb1.example.net --database people --collection usage
This operation only applies oplog entries from the last 24 hours. Use the --seconds argument to capture a greater or smaller amount of time. Consider the following example:
mongooplog --from mongodb0.example.net --seconds 172800
In this operation, mongooplog captures 2 full days of operations. To migrate 12 hours of oplog entries, use the following form:
mongooplog --from mongodb0.example.net --seconds 43200
For the previous two examples, mongooplog migrates entries to the mongod process running on the localhost interface connected to the 27017 port. mongooplog can also operate directly on MongoDB’s data files if no mongod is running on the target host. Consider the following example:
mongooplog --from mongodb0.example.net --dbpath /srv/mongodb --journal
Here, mongooplog imports oplog operations from the mongod host connected to port 27017. This migrates operations to the MongoDB data files stored in the /srv/mongodb directory. Additionally mongooplog will use the durability journal to ensure that the data files remain in a consistent state.
mongoimport provides a method for taking data in JSON, CSV, or TSV and importing it into a mongod instance. mongoexport provides a method to export data from a mongod instance into JSON, CSV, or TSV.
Note
The conversion between BSON and other formats lacks full type fidelity. Therefore you cannot use mongoimport and mongoexport for round-trip import and export operations.
The mongoimport tool provides a route to import content from a JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool. See the “Importing and Exporting MongoDB Data” document for a more in depth usage overview, and the “mongoexport” document for more information regarding mongoexport, which provides the inverse “importing” capability.
Note
Do not use mongoimport and mongoexport for full instance, production backups because they will not reliably capture data type information. Use mongodump and mongorestore as described in “Backup and Restoration Strategies” for this kind of functionality.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the mongoimport program.
Specifies a resolvable hostname for the mongod to which you want to restore the database. By default mongoimport will attempt to connect to a MongoDB process ruining on the localhost port numbered 27017.
Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017.
To connect to a replica set, use the --host argument with a setname, followed by a slash and a comma-separated list of host and port names. mongoimport will, given the seed of at least one connected set member, connect to primary node of that set. This option would resemble:
--host repl0/mongo0.example.net,mongo0.example.net,27018,mongo1.example.net,mongo2.example.net
You can always connect directly to a single MongoDB instance by specifying the host and port number directly.
Specifies the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongoimport --host command.
Enables IPv6 support that allows mongoimport to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongoimport, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongoimport --password option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the mongoimport --username option to supply a username.
If you specify a --username without the --password option, mongoimport will prompt for a password interactively.
Specifies the directory of the MongoDB data files. If used, the --dbpath option enables mongoimport to attach directly to local data files and insert the data without the mongod. To run with --dbpath, mongoimport needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs.
Use the --directoryperdb in conjunction with the corresponding option to mongod, which allows mongoimport to import data into MongoDB instances that have every database’s files saved in discrete directories on the disk. This option is only relevant when specifying the --dbpath option.
Allows mongoexport write to the durability journal to ensure that the data files will remain in a consistent state during the write process. This option is only relevant when specifying the --dbpath option.
Use the --db option to specify a database for mongoimport to restore data. If you do not specify a <db>, mongoimport creates new databases that correspond to the databases where data originated and data may be overwritten. Use this option to restore data into a MongoDB instance that already has data, or to restore only some data in the specified backup.
Use the --collection option to specify a collection for mongorestore to restore. If you do not specify a <collection>, mongoimport imports all collections created. Existing data may be overwritten. Use this option to restore data into a MongoDB instance that already has data, or to restore only some data in the specified imported data set.
Specify a field or number fields to import from the specified file. All other fields present in the export will be excluded during importation. Comma separate a list of fields to limit the fields imported.
As an alternative to “mongoimport --fields” the --fieldFile option allows you to specify a file (e.g. <file>`) to hold a list of field names to specify a list of fields to include in the export. All other fields will be excluded from the export. Place one field per line.
In csv and tsv exports, ignore empty fields. If not specified, mongoimport creates fields without values in imported documents.
Declare the type of export format to import. The default format is JSON, but it’s possible to import csv and tsv files.
Specify the location of a file containing the data to import. mongoimport will read data from standard input (e.g. “stdin.”) if you do not specify a file.
Modifies the importation procedure so that the target instance drops every collection before restoring the collection from the dumped backup.
If using “--type csv” or “--type tsv,” use the first line as field names. Otherwise, mongoimport will import the first line as a distinct document.
Modifies the import process to update existing objects in the database if they match an imported object, while inserting all other objects.
If you do not specify a field or fields using the --upsertFields mongoimport will upsert on the basis of the _id field.
Specifies a list of fields for the query portion of the upsert. Use this option if the _id fields in the existing documents don’t match the field in the document, but another field or field combination can uniquely identify documents as a basis for performing upsert operations.
To ensure adequate performance, indexes should exist for this field or fields.
New in version 2.2.
Forces mongoimport to halt the import operation at the first error rather than continuing the operation despite errors.
Changed in version 2.2: The limit on document size increased from 4MB to 16MB.
Accept import of data expressed with multiple MongoDB document within a single JSON array.
Use in conjunction with mongoexport --jsonArray to import data written as a single JSON array. Limited to imports of 16 MB or smaller.
In this example, mongoimport imports the csv formatted data in the /opt/backups/contacts.csv into the collection contacts in the users database on the MongoDB instance running on the localhost port numbered 27017.
mongoimport --db users --collection contacts --type csv --file /opt/backups/contacts.csv
In the following example, mongoimport imports the data in the JSON formatted file contacts.json into the collection contacts on the MongoDB instance running on the localhost port number 27017. Journaling is explicitly enabled.
mongoimport --collection contacts --file contacts.json --journal
In the next example, mongoimport takes data passed to it on standard input (i.e. with a | pipe.) and imports it into the collection contacts in the sales database is the MongoDB datafiles located at /srv/mongodb/. if the import process encounters an error, the mongoimport will halt because of the --stopOnError option.
mongoimport --db sales --collection contacts --stopOnError --dbpath /srv/mongodb/
In the final example, mongoimport imports data from the file /opt/backups/mdb1-examplenet.json into the collection contacts within the database marketing on a remote MongoDB database. This mongoimport accesses the mongod instance running on the host mongodb1.example.net over port 37017, which requires the username user and the password pass.
mongoimport --host mongodb1.example.net --port 37017 --username user --password pass --collection contacts --db marketing --file /opt/backups/mdb1-examplenet.json
mongoexport is a utility that produces a JSON or CSV export of data stored in a MongoDB instance. See the “Importing and Exporting MongoDB Data” document for a more in depth usage overview, and the “mongoimport” document for more information regarding the mongoimport utility, which provides the inverse “importing” capability.
Note
Do not use mongoimport and mongoexport for full-scale backups because they may not reliably capture data type information. Use mongodump and mongorestore as described in “Backup and Restoration Strategies” for this kind of functionality.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the mongoexport utility.
Specifies a resolvable hostname for the mongod from which you want to export data. By default mongoexport attempts to connect to a MongoDB process ruining on the localhost port number 27017.
Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017.
To connect to a replica set, you can specify the replica set seed name, and a seed list of set members, in the following format:
<replica_set_name>/<hostname1><:port>,<hostname2:<port>,...
Specifies the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongoexport --host command.
Enables IPv6 support that allows mongoexport to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongoexport, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongoexport --password option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the --username option to supply a username.
If you specify a --username without the --password option, mongoexport will prompt for a password interactively.
Specifies the directory of the MongoDB data files. If used, the --dbpath option enables mongoexport to attach directly to local data files and insert the data without the mongod. To run with --dbpath, mongoexport needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs.
Use the --directoryperdb in conjunction with the corresponding option to mongod, which allows mongoexport to export data into MongoDB instances that have every database’s files saved in discrete directories on the disk. This option is only relevant when specifying the --dbpath option.
Allows mongoexport operations to access the durability journal to ensure that the export is in a consistent state. This option is only relevant when specifying the --dbpath option.
Use the --db option to specify the name of the database that contains the collection you want to export.
Use the --collection option to specify the collection that you want mongoexport to export.
Specify a field or number fields to include in the export. All other fields will be excluded from the export. Comma separate a list of fields to limit the fields exported.
As an alternative to “--fields” the --fieldFile option allows you to specify a file (e.g. <file>`) to hold a list of field names to specify a list of fields to include in the export. All other fields will be excluded from the export. Place one field per line.
Provides a JSON document as a query that optionally limits the documents returned in the export.
Changes the export format to a comma separated values (CSV) format. By default mongoexport writes data using one JSON document for every MongoDB document.
Modifies the output of mongoexport to write the entire contents of the export as a single JSON array. By default mongoexport writes data using one JSON document for every MongoDB document.
Allows mongoexport to read data from secondary or slave nodes when using mongoexport with a replica set. This option is only available if connected to a mongod or mongos and is not available when used with the “mongoexport --dbpath” option.
This is the default behavior.
Specify a file to write the export to. If you do not specify a file name, the mongoexport writes data to standard output (e.g. stdout).
In the following example, mongoexport exports the collection contacts from the users database from the mongod instance running on the localhost port number 27017. This command writes the export data in CSV format into a file located at /opt/backups/contacts.csv.
mongoexport --db users --collection contacts --csv --out /opt/backups/contacts.csv
The next example creates an export of the collection contacts from the MongoDB instance running on the localhost port number 27017, with journaling explicitly enabled. This writes the export to the contacts.json file in JSON format.
mongoexport --db sales --collection contacts --out contacts.json --journal
The following example exports the collection contacts from the sales database located in the MongoDB data files located at /srv/mongodb/. This operation writes the export to standard output in JSON format.
mongoexport --db sales --collection contacts --dbpath /srv/mongodb/
Warning
The above example will only succeed if there is no mongod connected to the data files located in the /srv/mongodb/ directory.
The final example exports the collection contacts from the database marketing . This data resides on the MongoDB instance located on the host mongodb1.example.net running on port 37017, which requires the username user and the password pass.
mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection contacts --db marketing --out mdb1-examplenet.json
mongostat, mongotop, and mongosniff provide diagnostic information related to the current operation of a mongod instance.
Note
Because mongosniff depends on libpcap, most distributions of MongoDB do not include mongosniff.
The mongostat utility provides a quick overview of the status of a currently running mongod or mongos instance. mongostat is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and mongos instances.
See also
For more information about monitoring MongoDB, see Monitoring Database Systems.
For more background on various other MongoDB status outputs see:
For an additional utility that provides MongoDB metrics see “mongotop.”
mongostat connects to the mongod instance running on the local host interface on TCP port 27017; however, mongostat can connect to any accessible remote mongod instance.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the mongostat utility.
Specifies a resolvable hostname for the mongod from which you want to export data. By default mongostat attempts to connect to a MongoDB instance running on the localhost port number 27017.
Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017.
To connect to a replica set, you can specify the replica set seed name, and a seed list of set members, in the following format:
<replica_set_name>/<hostname1><:port>,<hostname2:<port>,...
Specifies the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongostat --host command.
Enables IPv6 support that allows mongostat to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongostat, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongostat --password option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the mongostat --username option to supply a username.
If you specify a --username without the --password option, mongostat will prompt for a password interactively.
Disables the output of column or field names.
Controls the number of rows to output. Use in conjunction with mongostat to control the duration of a mongostat operation.
Unless --rowcount is specified, mongostat will return an infinite number of rows (e.g. value of 0.)
Configures mongostat to collect data using the HTTP interface rather than a raw database connection.
With this option mongostat discovers and reports on statistics from all members of a replica set or sharded cluster. When connected to any member of a replica set, --discover all non-hidden members of the replica set. When connected to a mongos, mongostat will return data from all shards in the cluster. If a replica set provides a shard in the sharded cluster, mongostat will report on non-hidden members of that replica set.
The mongostat --host option is not required but potentially useful in this case.
The final argument is the length of time, in seconds, that mongostat waits in between calls. By default mongostat returns one call every second.
mongostat returns values that reflect the operations over a 1 second period. For values of <sleeptime> greater than 1, mongostat averages data to reflect average operations per second.
mongostat returns values that reflect the operations over a 1 second period. When mongostat <sleeptime> has a value greater than 1, mongostat averages the statistics to reflect average operations per second.
mongostat outputs the following fields:
The number of objects inserted into the database per second. If followed by an asterisk (e.g. *), the datum refers to a replicated operation.
The number of query operations per second.
The number of update operations per second.
The number of delete operations per second.
The number of get more (i.e. cursor batch) operations per second.
The number of commands per second. On slave and secondary systems, mongostat presents two values separated by a pipe character (e.g. |), in the form of local|replicated commands.
The number of fsync operations per second.
The total amount of data mapped in megabytes. This is the total data size at the time of the last mongostat call.
The amount of (virtual) memory in megabytes used by the process at the time of the last mongostat call.
The amount of (resident) memory in megabytes used by the process at the time of the last mongostat call.
Changed in version 2.1.
The number of page faults per second.
Before version 2.1 this value was only provided for MongoDB instances running on Linux hosts.
The percent of time in a global write lock.
Changed in version 2.2: The locked db field replaces the locked % field to more appropriate data regarding the database specific locks in version 2.2.
New in version 2.2.
The percent of time in the per-database context-specific lock. mongostat will report the database that has spent the most time since the last mongostat call with a write lock.
This value represents the amount of time the database had a database specific lock and the time that the mongod spent in the global lock. Because of this, and the sampling method, you may see some values greater than 100%.
The percent of index access attempts that required a page fault to load a btree node. This is a sampled value.
The length of the queue of clients waiting to read data from the MongoDB instance.
The length of the queue of clients waiting to write data from the MongoDB instance.
The number of active clients performing read operations.
The number of active clients performing write operations.
The amount of network traffic, in bytes, received by the MongoDB instance.
This includes traffic from mongostat itself.
The amount of network traffic, in bytes, sent by the MongoDB instance.
This includes traffic from mongostat itself.
The total number of open connections.
The name, if applicable, of the replica set.
In the first example, mongostat will return data every second for 20 seconds. mongostat collects data from the mongod instance running on the localhost interface on port 27017. All of the following invocations produce identical behavior:
mongostat --rowcount 20 1
mongostat --rowcount 20
mongostat -n 20 1
mongostat -n 20
In the next example, mongostat returns data every 5 minutes (or 300 seconds) for as long as the program runs. mongostat collects data from the mongod instance running on the localhost interface on port 27017. Both of the following invocations produce identical behavior.
mongostat --rowcount 0 300
mongostat -n 0 300
mongostat 300
In the following example, mongostat returns data every 5 minutes for an hour (12 times.) mongostat collects data from the mongod instance running on the localhost interface on port 27017. Both of the following invocations produce identical behavior.
mongostat --rowcount 12 300
mongostat -n 12 300
In many cases, using the --discover will help provide a more complete snapshot of the state of an entire group of machines. If a mongos process connected to a sharded cluster is running on port 27017 of the local machine, you can use the following form to return statistics from all members of the cluster:
mongostat --discover
mongotop provides a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on a per-collection level. By default, mongotop returns values every second.
See also
For more information about monitoring MongoDB, see Monitoring Database Systems.
For additional background on various other MongoDB status outputs see:
For an additional utility that provides MongoDB metrics see “mongostat.”
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Print the version of the mongotop utility and exit.
Specifies a resolvable hostname for the mongod from which you want to export data. By default mongotop attempts to connect to a MongoDB process running on the localhost port number 27017.
Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017.
To connect to a replica set, you can specify the replica set seed name, and a seed list of set members, in the following format:
<replica_set_name>/<hostname1><:port>,<hostname2:<port>,...
Specifies the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongotop --host command.
Enables IPv6 support that allows mongotop to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongotop, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongotop option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the --username option to supply a username.
If you specify a --username without the --password option, mongotop will prompt for a password interactively.
New in version 2.2.
Toggles the mode of mongotop to report on use of per-database locks. These data are useful for measuring concurrent operations and lock percentage.
The final argument is the length of time, in seconds, that mongotop waits in between calls. By default mongotop returns data every second.
mongotop returns time values specified in milliseconds (ms.)
mongotop only reports active namespaces or databases, depending on the --locks option. If you don’t see a database or collection, it has received no recent activity. You can issue a simple operation in the mongo shell to generate activity to affect the output of mongotop.
Contains the database namespace, which combines the database name and collection.
Changed in version 2.2: If you use the --locks, the ns field does not appear in the mongotop output.
New in version 2.2.
Contains the name of the database. The database named . refers to the global lock, rather than a specific database.
This field does not appear unless you have invoked mongotop with the --locks option.
Provides the total amount of time that this mongod spent operating on this namespace.
Provides the amount of time that this mongod spent performing read operations on this namespace.
Provides the amount of time that this mongod spent performing write operations on this namespace.
Provides a time stamp for the returned data.
By default mongotop connects to the MongoDB instance running on the localhost port 27017. However, mongotop can optionally connect to remote mongod instances. See the mongotop options for more information.
To force mongotop to return less frequently specify a number, in seconds at the end of the command. In this example, mongotop will return every 15 seconds.
mongotop 15
This command produces the following output:
connected to: 127.0.0.1
ns total read write 2012-08-13T15:45:40
test.system.namespaces 0ms 0ms 0ms
local.system.replset 0ms 0ms 0ms
local.system.indexes 0ms 0ms 0ms
admin.system.indexes 0ms 0ms 0ms
admin. 0ms 0ms 0ms
ns total read write 2012-08-13T15:45:55
test.system.namespaces 0ms 0ms 0ms
local.system.replset 0ms 0ms 0ms
local.system.indexes 0ms 0ms 0ms
admin.system.indexes 0ms 0ms 0ms
admin. 0ms 0ms 0ms
To return a mongotop report every 5 minutes, use the following command:
mongotop 300
To report the use of per-database locks, use mongotop --locks, which produces the following output:
$ mongotop --locks
connected to: 127.0.0.1
db total read write 2012-08-13T16:33:34
local 0ms 0ms 0ms
admin 0ms 0ms 0ms
. 0ms 0ms 0ms
mongosniff provides a low-level operation tracing/sniffing view into database activity in real time. Think of mongosniff as a MongoDB-specific analogue of tcpdump for TCP/IP network traffic. Typically, mongosniff is most frequently used in driver development.
Note
mongosniff requires libpcap and is only available for Unix-like systems. Furthermore, the version distributed with the MongoDB binaries is dynamically linked against aversion 0.9 of libpcap. If your system has a different version of libpcap, you will need to compile mongosniff yourself or create a symbolic link pointing to libpcap.so.0.9 to your local version of libpcap. Use an operation that resembles the following:
ln -s /usr/lib/libpcap.so.1.1.1 /usr/lib/libpcap.so.0.9
Change the path’s and name of the shared library as needed.
As an alliterative to mongosniff, Wireshark, a popular network sniffing tool is capable of inspecting and parsing the MongoDB wire protocol.
Returns a basic help and usage text.
Declares a host to forward all parsed requests that the mongosniff intercepts to another mongod instance and issue those operations on that database instance.
Specify the target host name and port in the <host>:<port> format.
To connect to a replica set, you can specify the replica set seed name, and a seed list of set members, in the following format:
<replica_set_name>/<hostname1><:port>,<hostname2:<port>,...
Specifies source material to inspect. Use --source NET [interface] to inspect traffic from a network interface (e.g. eth0 or lo.) Use --source FILE [filename] to read captured packets in pcap format.
You may use the --source DIAGLOG [filename] option to read the output files produced by the --diaglog option.
Modifies the behavior to only display invalid BSON objects and nothing else. Use this option for troubleshooting driver development. This option has some performance impact on the performance of mongosniff.
Specifies alternate ports to sniff for traffic. By default, mongosniff watches for MongoDB traffic on port 27017. Append multiple port numbers to the end of mongosniff to monitor traffic on multiple ports.
Use the following command to connect to a mongod or mongos running on port 27017 and 27018 on the localhost interface:
mongosniff --source NET lo 27017 27018
Use the following command to only log invalid BSON objects for the mongod or mongos running on the localhost interface and port 27018, for driver development and troubleshooting:
mongosniff --objcheck --source NET lo 27018
To build mongosniff yourself, Linux users can use the following procedure:
Obtain prerequisites using your operating systems package management software. Dependencies include:
Download a copy of the MongoDB source code using git:
git clone git://github.com/mongodb/mongo.git
Issue the following sequence of commands to change to the mongo/ directory and build mongosniff:
cd mongo
scons mongosniff
mongofiles provides a command-line interact to a MongoDB GridFS storage system.
The mongofiles utility makes it possible to manipulate files stored in your MongoDB instance in GridFS objects from the command line. It is particularly useful as it provides an interface between objects stored in your file system and GridFS.
All mongofiles commands take arguments in three groups:
mongofiles, like mongodump, mongoexport, mongoimport, and mongorestore, can access data stored in a MongoDB data directory without requiring a running mongod instance, if no other mongod is running.
Note
For replica sets, mongofiles can only read from the set’s ‘primary.
Lists the files in the GridFS store. The characters specified after list (e.g. <prefix>) optionally limit the list of returned items to files that begin with that string of characters.
Lists the files in the GridFS store with names that match any portion of <string>.
Copy the specified file from the local file system into GridFS storage.
Here, <filename> refers to the name the object will have in GridFS, and mongofiles assumes that this reflects the name the file has on the local file system. If the local filename is different use the mongofiles --local option.
Copy the specified file from GridFS storage to the local file system.
Here, <filename> refers to the name the object will have in GridFS, and mongofiles assumes that this reflects the name the file has on the local file system. If the local filename is different use the mongofiles --local option.
Delete the specified file from GridFS storage.
Returns a basic help and usage text.
Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)
Returns the version of the mongofiles utility.
Specifies a resolvable hostname for the mongod that holds your GridFS system. By default mongofiles attempts to connect to a MongoDB process ruining on the localhost port number 27017.
Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017.
Specifies the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongofiles --host command.
Enables IPv6 support that allows mongofiles to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongofiles, disable IPv6 support by default.
Specifies a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongofiles --password option to supply a password.
Specifies a password to authenticate to the MongoDB instance. Use in conjunction with the mongofiles --username option to supply a username.
If you specify a --username without the --password option, mongofiles will prompt for a password interactively.
Specifies the directory of the MongoDB data files. If used, the --dbpath option enables mongofiles to attach directly to local data files interact with the GridFS data without the mongod. To run with --dbpath, mongofiles needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs.
Use the --directoryperdb in conjunction with the corresponding option to mongod, which allows mongofiles when running with the --dbpath option and MongoDB uses an on-disk format where every database has a distinct directory. This option is only relevant when specifying the --dbpath option.
Allows mongofiles operations to use the durability journal when running with --dbpath to ensure that the database maintains a recoverable state. This forces mongofiles to record all data on disk regularly.
Use the --db option to specify the MongoDB database that stores or will store the GridFS files.
This option has no use in this context and a future release may remove it. See SERVER-4931 for more information.
Specifies the local filesystem name of a file for get and put operations.
In the mongofiles put and mongofiles get commands the required <filename> modifier refers to the name the object will have in GridFS. mongofiles assumes that this reflects the file’s name on the local file system. This setting overrides this default.
Provides the ability to specify a MIME type to describe the file inserted into GridFS storage. mongofiles omits this option in the default operation.
Use only with mongofiles put operations.
Alters the behavior of mongofiles put to replace existing GridFS objects with the specified local file, rather than adding an additional object with the same name.
In the default operation, files will not be overwritten by a mongofiles put option.
To return a list of all files in a GridFS collection in the records database, use the following invocation at the system shell:
mongofiles -d records list
This mongofiles instance will connect to the mongod instance running on the 27017 localhost interface to specify the same operation on a different port or hostname, and issue a command that resembles one of the following:
mongofiles --port 37017 -d records list
mongofiles --hostname db1.example.net -d records list
mongofiles --hostname db1.example.net --port 37017 -d records list
Modify any of the following commands as needed if you’re connecting the mongod instances on different ports or hosts.
To upload a file named 32-corinth.lp to the GridFS collection in the records database, you can use the following command:
mongofiles -d records put 32-corinth.lp
To delete the 32-corinth.lp file from this GridFS collection in the records database, you can use the following command:
mongofiles -d records delete 32-corinth.lp
To search for files in the GridFS collection in the records database that have the string corinth in their names, you can use following command:
mongofiles -d records search corinth
To list all files in the GridFS collection in the records database that begin with the string 32, you can use the following command:
mongofiles -d records list 32
To fetch the file from the GridFS collection in the records database named 32-corinth.lp, you can use the following command:
mongofiles -d records get 32-corinth.lp
Administrators and users can control mongod or mongos instances at runtime either directly from mongod’s command line arguments or using a configuration file.
While both methods are functionally equivalent and all settings are similar, the configuration file method is preferable. If you installed from a package and have started MongoDB using your system’s control script, you’re already using a configuration file.
To start mongod or mongos using a config file, use one of the following forms:
mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf
mongos --config /srv/mongodb/mongos.conf
mongos -f /srv/mongodb/mongos.conf
Declare all settings in this file using the following form:
<setting> = <value>
New in version 2.0: Before version 2.0, Boolean (i.e. true|false) or “flag” parameters, register as true, if they appear in the configuration file, regardless of their value.
Default: false
Increases the amount of internal reporting returned on standard output or in the log file generated by logpath. To enable verbose or to enable increased verbosity with vvvv, set these options as in the following example:
verbose = true
vvvv = true
MongoDB has the following levels of verbosity:
Default: false
Additional increase in verbosity of output and logging.
Default: false
Additional increase in verbosity of output and logging.
Default: false
Additional increase in verbosity of output and logging.
Default: false
Additional increase in verbosity of output and logging.
Default: false
Runs the mongod or mongos instance in a quiet mode that attempts to limit the amount of output. This option suppresses:
Default: 27017
Specifies a TCP port for the mongod or mongos instance to listen for client connections. UNIX-like systems require root access for ports with numbers lower than 1000.
Default: All interfaces.
Set this option to configure the mongod or mongos process to bind to and listen for connections from applications on this address. You may attach mongod or mongos instances to any interface; however, if you attach the process to a publicly accessible interface, implement proper authentication or firewall restrictions to protect the integrity of your database.
You may concatenate a list of comma separated values to bind mongod to multiple IP addresses.
Default: depends on system (i.e. ulimit and file descriptor) limits. Unless set, MongoDB will not limit its own connections.
Specifies a value to set the maximum number of simultaneous connections that mongod or mongos will accept. This setting has no effect if it is higher than your operating system’s configured maximum connection tracking threshold.
This is particularly useful for mongos if you have a client that creates a number of collections but allows them to timeout rather than close the collections. When you set maxConns, ensure the value is slightly higher than the size of the connection pool or the total number of connections to prevent erroneous connection spikes from propagating to the members of a shard cluster.
Note
You cannot set maxConns to a value higher than 20000.
Default: false
Set to true to force mongod to validate all requests from clients upon receipt to ensure that invalid BSON objects are never inserted into the database. mongod does not enable this by default because of the required overhead.
Default: None. (i.e. /dev/stdout)
Specify the path to a file name for the log file that will hold all diagnostic logging information.
Unless specified, mongod will output all log information to the standard output. Unless logappend is true, the logfile will be overwritten when the process restarts.
Note
Currently, MongoDB will overwrite the contents of the log file if the logappend is not used. This behavior may change in the future depending on the outcome of SERVER-4499.
Default: false
Set to true to add new entries to the end of the logfile rather than overwriting the content of the log when the process restarts.
If this setting is not specified, then MongoDB will overwrite the existing logfile upon start up.
Note
The behavior of the logging system may change in the near future in response to the SERVER-4499 case.
New in version 2.1.0.
Sends all logging output to the host’s syslog system rather than to standard output or a log file as with logpath.
Default: None.
Specify a file location to hold the “PID” or process ID of the mongod process. Useful for tracking the mongod process in combination with the fork setting.
Without this option, mongod creates no PID file.
Default: None.
Specify the path to a key file to store authentication information. This option is only useful for the connection between replica set members.
See also
Default: false
Set to true to disable listening on the UNIX socket. Unless set to false, mongod and mongos provide a UNIX-socket.
Default: /tmp
Specifies a path for the UNIX socket. Unless this option has a value, mongod and mongos, create a socket with the /tmp as a prefix.
Default: false
Set to true to enable a daemon mode for mongod that runs the process in the background.
Default: false
Set to true to enable database authentication for users connecting from remote hosts. Configure users via the mongo shell. If no users exist, the localhost interface will continue to have access to the database until the you create the first user.
Default: false
Set to true to force mongod to report every four seconds CPU utilization and the amount of time that the processor waits for I/O operations to complete (i.e. I/O wait.) MongoDB writes this data to standard output, or the logfile if using the logpath option.
Default: /data/db/
Set this value to designate a directory for the mongod instance to store its data. Typical locations include: /srv/mongodb, /var/lib/mongodb or /opt/mongodb
Unless specified, mongod will look for data files in the default /data/db directory. (Windows systems use the \data\db directory.) If you installed using a package management system. Check the /etc/mongodb.conf file provided by your packages to see the configuration of the dbpath.
Default: 0
Creates a very verbose, diagnostic log for troubleshooting and recording various errors. MongoDB writes these log files in the dbpath directory in a series of files that begin with the string diaglog with the time logging was initiated appended as a hex string.
The value of this setting configures the level of verbosity. Possible values, and their impact are as follows.
| Value | Setting |
| 0 | off. No logging. |
| 1 | Log write operations. |
| 2 | Log read operations. |
| 3 | Log both read and write operations. |
| 7 | Log write and some read operations. |
You can use the mongosniff tool to replay this output for investigation. Given a typical diaglog file, located at /data/db/diaglog.4f76a58c, you might use a command in the following form to read these files:
mongosniff --source DIAGLOG /data/db/diaglog.4f76a58c
diaglog is for internal use and not intended for most users.
Warning
Setting the diagnostic level to 0 will cause mongod to stop writing data to the diagnostic log file. However, the mongod instance will continue to keep the file open, even if it is no longer writing data to the file. If you want to rename, move, or delete the diagnostic log you must cleanly shut down the mongod instance before doing so.
Default: false
Set to true to modify the storage pattern of the data directory to store each database’s files in a distinct folder. This option will create directories within the dbpath named for each directory.
Use this option in conjunction with your file system and device configuration so that MongoDB will store data on a number of distinct disk devices to increase write throughput or disk capacity.
Default: (on 64-bit systems) true
Default: (on 32-bit systems) false
Set to true to enable operation journaling to ensure write durability and data consistency.
Set to false to prevent the overhead of journaling in situations where durability is not required. To reduce the impact of the journaling on disk usage, you can leave journal enabled, and set smallfiles to true to reduce the size of the data and journal files.
Default: 100
Set this value to specify the maximum amount of time for mongod to allow between journal operations. The default value is 100 milliseconds. Lower values increase the durability of the journal, at the possible expense of disk performance.
This option accepts values between 2 and 300 milliseconds.
To force mongod to commit to the journal more frequently, you can specify j:true. When a write operation with j:true pending, mongod will reduce journalCommitInterval to a third of the set value.
Default: false
Set to true to IPv6 support to allow clients to connect to mongod using IPv6 networks. mongod disables IPv6 support by default in mongod and all utilities.
Default: false
Set to true to permit JSONP access via an HTTP interface. Consider the security implications of allowing this activity before setting this option.
Default: true
Disable authentication. Currently the default. Exists for future compatibility and clarity.
For consistency use the auth option.
Default: false
Set to true to disable the HTTP interface. This command will override the rest and disable the HTTP interface if you specify both.
Changed in version 2.1.2: The nohttpinterface option is not available for mongos instances before 2.1.2
Default: (on 64-bit systems) false
Default: (on 32-bit systems) true
Set nojournal = true to disable durability journaling. By default, mongod enables journaling in 64-bit versions after v2.0.
Default: false
Set noprealloc = true to disable the preallocation of data files. This will shorten the start up time in some cases, but can cause significant performance penalties during normal operations.
Default: false
Set noscripting = true to disable the scripting engine.
Default: false
Set notablescan = true to forbid operations that require a table scan.
Default: 16
Specify this value in megabytes. The maximum size is 2047 megabytes.
Use this setting to control the default size for all newly created namespace files (i.e .ns). This option has no impact on the size of existing namespace files.
See Limits on namespaces.
Default: 0
Modify this value to changes the level of database profiling, which inserts information about operation performance into output of mongod or the log file if specified by logpath. The following levels are available:
| Level | Setting |
| 0 | Off. No profiling. |
| 1 | On. Only includes slow operations. |
| 2 | On. Includes all operations. |
By default, mongod disables profiling. Database profiling can impact database performance because the profiler must record and process all database operations. Enable this option only after careful consideration.
Default: false
Set to true to enable a maximum limit for the number data files each database can have. The default quota is 8 data files, when quota is true. Adjust the quota size with the with the quotaFiles setting.
Default: 8
Modify limit on the number of data files per database. This option requires the quota setting.
Default: false
Set to true to run a repair routine on all databases following start up. In general you should set this option on the command line and not in the configuration file or in a control script.
Use the mongod --repair option to access this functionality.
Note
Because mongod rewrites all of the database files during the repair routine, if you do not run repair under the same user account as mongod usually runs, you will need to run chown on your database files to correct the permissions before starting mongod again.
Default: dbpath
Specify the path to the directory containing MongoDB data files, to use in conjunction with the repair setting or mongod --repair operation. Defaults to the value specified by dbpath.
Default: 100
Specify values in milliseconds.
Sets the threshold for mongod to consider a query “slow” for the database profiler. The database logs all slow queries to the log, even when the profiler is not turned on. When the database profiler is on, mongod the profiler writes to the system.profile collection.
See also
“profile“
Default: false
Set to true to modify MongoDB to use a smaller default data file size. Specifically, smallfiles reduces the initial size for data files and limits them to 512 megabytes. The smallfiles setting also reduces the size of each journal files from 1 gigabyte to 128 megabytes.
Use the smallfiles setting if you have a large number of databases that each hold a small quantity of data. The smallfiles setting can lead mongod to create many files, which may affect performance for larger databases.
Default: 60
mongod writes data very quickly to the journal, and lazily to the data files. syncdelay controls how much time can pass before MongoDB flushes data to the datafiles via an fsync operation. The default setting is 60 seconds. We recommend almost always using the default setting of 60.
The serverStatus command reports the background flush thread’s status via the backgroundFlushing field.
Note
If --syncdelay is 0, mongod flushes all operations to disk immediately, which has a significant impact on performance. Run with journal enabled, which is the default for 64-bit MongoDB builds.
Default: false
When set to true, mongod returns diagnostic system information regarding the page size, the number of physical pages, and the number of available physical pages to standard output.
More typically, run this operation by way of the mongod --sysinfo command. When running with the sysinfo, only mongod only outputs the page information and no database process will start.
Default: false
When set to true this option upgrades the on-disk data format of the files specified by the dbpath to the latest version, if needed.
This option only affects the operation of mongod if the data files are in an old format.
When specified for a mongos instance, this option updates the meta data format used by the config database.
Note
In most cases you should not set this value, so you can exercise the most control over your upgrade process. See the MongoDB release notes (on the download page) for more information about the upgrade process.
Default: false
For internal diagnostic use only.
Default: <none>
Form: <setname>
Use this setting to configure replication with replica sets. Specify a replica set name as an argument to this set. All hosts must have the same set name.
See also
“Replication,” “Replica Set Administration,” and “Replica Set Configuration“
Specifies a maximum size in megabytes for the replication operation log (e.g. oplog.) mongod creates an oplog based on the maximum amount of space available. For 64-bit systems, the oplog is typically 5% of available disk space.
Once the mongod has created the oplog for the first time, changing oplogSize will not affect the size of the oplog.
Default: false
In the context of replica set replication, set this option to true if you have seeded this replica with a snapshot of the dbpath of another member of the set. Otherwise the mongod will attempt to perform a full sync.
Warning
If the data is not perfectly synchronized and mongod starts with fastsync, then the secondary or slave will be permanently out of sync with the primary, which may cause significant consistency problems.
New in version 2.2.
Default: all
Values: all, none, and _id_only
You must use replIndexPrefetch in conjunction with replSet.
By default secondary members of a replica set will load all indexes related to an operation into memory before applying operations from the oplog. You can modify this behavior so that the secondaries will only load the _id index. Specify _id_only or none to prevent the mongod from loading any index into memory.
Default: false
Set to true to configure the current instance to act as master instance in a replication configuration.
Default: false
Set to true to configure the current instance to act as slave instance in a replication configuration.
Default: <>
Form: <host>:<port>
Used with the slave setting to specify the master instance from which this slave instance will replicate
Default: <>
Used with the slave option, the only setting specifies only a single database to replicate.
Default: 0
Used with the slave setting, the slavedelay setting configures a “delay” in seconds, for this slave to wait to apply operations from the master instance.
Default: false
Used with the slave setting, set autoresync to true to force the slave to automatically resync if the is more than 10 seconds behind the master. This setting may be problematic if the --oplogSize oplog is too small (controlled by the --oplogSize option.) If the oplog not large enough to store the difference in changes between the master’s current state and the state of the slave, this instance will forcibly resync itself unnecessarily. When you set the autoresync option, the slave will not attempt an automatic resync more than once in a ten minute period.
Default: false
Set this value to true to configure this mongod instance to operate as the config database of a shard cluster. When running with this option, clients will not be able to write data to any database other than config and admin. The default port for :program:`mongod` with this option is ``27019 and mongod writes all data files to the /configdb sub-directory of the dbpath directory.
Default: false
Set this value to true to configure this mongod instance as a shard in a partitioned cluster. The default port for these instances is 27018. The only affect of shardsvr is to change the port number.
Default: false
When set to true, noMoveParanoia disables a “paranoid mode” for data writes for chunk migration operation. See the chunk migration and moveChunk command documentation for more information.
By default mongod will save copies of migrated chunks on the “from” server during migrations as “paranoid mode.” Setting this option disables this paranoia.
Default: None.
Format: <config1>,<config2><:port>,<config3>
Set this option to specify a configuration database (i.e. config database) for the sharded cluster. You must specify either 1 configuration server or 3 configuration servers, in a comma separated list.
This setting only affects mongos processes.
Note
mongos instances read from the first config server in the list provided. All mongos instances must specify the hosts to the configdb setting in the same order.
If your configuration databases reside in more that one data center, order the hosts in the setting setting so that the config database that is closest to the majority of your mongos instances is first servers in the list.
Warning
Never remove a config server from the configdb parameter, even if the config server or servers are not available, or offline.
Default: false
Only runs unit tests and does not start a mongos instance.
This setting only affects mongos processes and is for internal testing use only.
Default: 64
The value of this option determines the size of each chunk of data distributed around the sharded cluster. The default value is 64 megabytes. Larger chunks may lead to an uneven distribution of data, while smaller chunks may lead to frequent and unnecessary migrations. However, in some circumstances it may be necessary to set a different chunk size.
This setting only affects mongos processes. Furthermore, chunkSize only sets the chunk size when initializing the cluster for the first time. If you modify the run-time option later, the new value will have no effect. See the “Modify Chunk Size” procedure if you need to change the chunk size on an existing sharded cluster.
New in version 2.2.
localThreshold affects the logic that program:mongos uses when selecting replica set members to pass reads operations to from clients. Specify a value to localThreshold in milliseconds. The default value is 15, which corresponds to the default value in all of the client drivers.
This setting only affects mongos processes.
When mongos receives a request that permits reads to secondary members, the mongos will:
find the member of the set with the lowest ping time.
construct a list of replica set members that is within a ping time of 15 milliseconds of the nearest suitable member of the set.
If you specify a value for localThreshold, mongos will construct the list of replica members that are within the latency allowed by this value.
The mongos will select a member to read from at random from this list.
The ping time used for a set member compared by the --localThreshold setting is a moving average of recent ping times, calculated, at most, every 10 seconds. As a result, some queries may reach members above the threshold until the mongos recalculates the average.
See the Member Selection section of the read preference documentation for more information.
This document describes the URI format for defining connections between applications and MongoDB instances in the official MongoDB drivers.
This section describes the standard format of the MongoDB connection URI used to connect to a MongoDB database server. The format is the same for all official MongoDB drivers. For a list of drivers and links to driver documentation, see Drivers.
The following is the standard URI connection scheme:
mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
The components of this string are:
mongodb://
A required prefix to identify that this is a string in the standard connection format.
username:password@
Optional. If specified, the client will attempt to log in to the specific database using these credentials after connecting to the mongod instance.
host1
This the only required part of the URI. It identifies a server address to connect to. It identifies either a hostname, IP address, or UNIX domain socket.
:port1
Optional. The default value is :27017 if not specified.
hostX
Optional. You can specify as many hosts as necessary. You would specify multiple hosts, for example, for connections to replica sets.
:portX
Optional. The default value is :27017 if not specified.
/database
Optional. The name of the database to authenticate if the connection string includes authentication credentials in the form of username:password@. If /database is not specified and the connection string includes credentials, the driver will authenticate to the admin database.
?options
Connection specific options. See Connection String Options for a full description of these options.
If the connection string does not specify a database/ you must specify a slash (i.e. /) between the last hostN and the question mark that begins the string of options.
Example
To describe a connection to a replica set named test, with the following mongod hosts:
You would use a connection string that resembles the following:
mongodb://db1.example.net,db2.example.net:2500/?replicaSet=test
This section lists all connection options used in the Standard Connection String Format.The options are not case-sensitive.
Connection options are pairs in the following form: name=value. Separate options with the ampersand (i.e. &) character. In the following example, a connection uses the replicaSet and connectTimeoutMS options:
mongodb://db1.example.net,db2.example.net:2500/?replicaSet=test&connectTimeoutMS=300000
Semi-colon separator for connection string arguments
To provide backwards compatibility, drivers currently accept semi-colons (i.e. ;) as option separators.
Specifies the name of the replica set, if the mongod is a member of a replica set.
When connecting to a replica set it is important to give a seed list of at least two mongod instances. If you only provide the connection point of a single mongod instance, and omit the replicaSet, the client will create a standalone connection.
true: Initiate the connection with SSL.
false: Initiate the connection without SSL.
The default value is false.
Note
The ssl option is not supported by all drivers. See your driver documentation and the Using MongoDB with SSL Connections document.
Most drivers implement some kind of connection pooling handle this for you behind the scenes. Some drivers do not support connection pools. See your driver documentation for more information on the connection pooling implementation. These options allow applications to configure the connection pool when connecting to the MongoDB deployment.
The maximum number of connections in the connection pool. The default value is 100.
The minimum number of connections in the connection pool. The default value is 0.
Note
The minPoolSize option is not supported by all drivers. For information on your driver, see the drivers documentation.
The maximum number of milliseconds that a connection can remain idle in the pool before being removed and closed.
This option is not supported by all drivers.
A number that the driver multiples the maxPoolSize value to, to provide the maximum number of threads allowed to wait for a connection to become available from the pool. For default values, see the Drivers documentation.
Write concern describes the kind of assurances :that the program:mongod and the driver provide to the application :regarding the success and durability of the write operation. For a :full explanation of write concern and write operations in general see the: Write Operations:
Defines the level and kind of write concern, that the driver uses when calling getLastError. This option can take either a number or a string as a value.
| Options: |
|
|---|
The time in milliseconds to wait for replication to succeed, as specified in the w option, before timing out.
Controls whether write operations will wait till the mongod acknowledges the write operations and commits the data to the on disk journal.
| Options: |
|
|---|
Read preferences describe the behavior of read operations with regards to replica sets. These parameters allow you to specify read preferences on a per-connection basis in the connection string:
Specifies the replica set read preference for this connection. This setting overrides any slaveOk value. The read preference values are the following:
For descriptions of each value, see Read Preference Modes.
The default value is primary, which sends all read operations to the replica set’s primary.
Specifies a tag set as a comma-separated list of colon-separated key-value pairs. For example:
dc:ny,rack:1
To specify a list of tag sets, use multiple readPreferenceTags. The following specifies two tag sets and an empty tag set:
readPreferenceTags=dc:ny,rack:1&readPreferenceTags=dc:ny&readPreferenceTags=
Order matters when using multiple readPreferenceTags.
| Parameters: |
|
|---|
For the default, see the drivers documentation for your driver.
Note
Not all drivers support the uuidRepresentation option. For information on your driver, see the drivers documentation.
Consider the following example MongoDB URI strings, that specify common connections:
Connect to a database server running locally on the default port:
mongodb://localhost
Connect and log in to the admin database as user sysop with the password moon:
mongodb://sysop:moon@localhost
Connect and log in to the records database as user sysop with the password moon:
mongodb://sysop:moon@localhost/records
Connect to a UNIX domain socket:
mongodb:///tmp/mongodb-27017.sock
Note
Not all drivers support UNIX domain sockets. For information on your driver, see the drivers documentation.
Connect to a replica set with two members, one on db1.example.net and the other on db2.example.net:
mongodb://db1.example.net,db2.example.com
Connect to a replica set with three members running on localhost, on ports 27017, 27018, and 27019:
mongodb://localhost,localhost:27018,localhost:27019
Connect to a replica set with three members. Send all writes to the primary and distribute reads to the secondaries:
mongodb://example1.com,example2.com,example3.com/?readPreference=secondary
Connect to a replica set with write concern configured to wait for replication to succeed on at least two members, with a two-second timeout.
mongodb://example1.com,example2.com,example3.com/?w=2&wtimeoutMS=2000
This document provides a quick overview and example of the serverStatus command. The helper db.serverStatus() in the mongo shell provides access to this output. For full documentation of the content of this output, see Server Status Reference.
Note
The fields included in this output vary slightly depending on the version of MongoDB, underlying operating system platform, and the kind of node, including mongos, mongod or replica set member.
The “Instance Information” section displays information regarding the specific mongod and mongos and its state.
{
"host" : "<hostname>",
"version" : "<version>",
"process" : "<mongod|mongos>",
"pid" : <num>,
"uptime" : <num>,
"uptimeMillis" : <num>,
"uptimeEstimate" : <num>,
"localTime" : ISODate(""),
The “locks” section reports data that reflect the state and use of both global (i.e. .) and database specific locks:
"locks" : {
"." : {
"timeLockedMicros" : {
"R" : <num>,
"W" : <num>
},
"timeAcquiringMicros" : {
"R" : <num>,
"W" : <num>
}
},
"admin" : {
"timeLockedMicros" : {
"r" : <num>,
"w" : <num>
},
"timeAcquiringMicros" : {
"r" : <num>,
"w" : <num>
}
},
"local" : {
"timeLockedMicros" : {
"r" : <num>,
"w" : <num>
},
"timeAcquiringMicros" : {
"r" : <num>,
"w" : <num>
}
},
"<database>" : {
"timeLockedMicros" : {
"r" : <num>,
"w" : <num>
},
"timeAcquiringMicros" : {
"r" : <num>,
"w" : <num>
}
}
},
The “globalLock” field reports on MongoDB’s global system lock. In most cases the locks document provides more fine grained data that reflects lock use:
"globalLock" : {
"totalTime" : <num>,
"lockTime" : <num>,
"currentQueue" : {
"total" : <num>,
"readers" : <num>,
"writers" : <num>
},
"activeClients" : {
"total" : <num>,
"readers" : <num>,
"writers" : <num>
}
},
The “mem” field reports on MongoDB’s current memory use:
"mem" : {
"bits" : <num>,
"resident" : <num>,
"virtual" : <num>,
"supported" : <boolean>,
"mapped" : <num>,
"mappedWithJournal" : <num>
},
The “connections” field reports on MongoDB’s current memory use by the MongoDB process:
"connections" : {
"current" : <num>,
"available" : <num>
},
The fields in the “extra_info” document provide platform specific information. The following example block is from a Linux-based system:
"extra_info" : {
"note" : "fields vary by platform",
"heap_usage_bytes" : <num>,
"page_faults" : <num>
},
The “indexCounters” document reports on index use:
"indexCounters" : {
"btree" : {
"accesses" : <num>,
"hits" : <num>,
"misses" : <num>,
"resets" : <num>,
"missRatio" : <num>
}
},
The “backgroundFlushing” document reports on the process MongoDB uses to write data to disk:
"backgroundFlushing" : {
"flushes" : <num>,
"total_ms" : <num>,
"average_ms" : <num>,
"last_ms" : <num>,
"last_finished" : ISODate("")
},
The “cursors” document reports on current cursor use and state:
"cursors" : {
"totalOpen" : <num>,
"clientCursors_size" : <num>,
"timedOut" : <num>
},
The “network” document reports on network use and state:
"network" : {
"bytesIn" : <num>,
"bytesOut" : <num>,
"numRequests" : <num>
},
The “repl” document reports on the state of replication and the replica set. This document only appears for replica sets.
"repl" : {
"setName" : "<string>",
"ismaster" : <boolean>,
"secondary" : <boolean>,
"hosts" : [
<hostname>,
<hostname>,
<hostname>
],
"primary" : <hostname>,
"me" : <hostname>
},
The “opcountersRepl” document reports the number of replicated operations:
"opcountersRepl" : {
"insert" : <num>,
"query" : <num>,
"update" : <num>,
"delete" : <num>,
"getmore" : <num>,
"command" : <num>
},
The “replNetworkQueue” document holds information regarding the queue that secondaries use to poll data from other members of their set:
"replNetworkQueue" : {
"waitTimeMs" : <num>,
"numElems" : <num>,
"numBytes" : <num>
},
The “opcounters” document reports the number of operations this MongoDB instance has processed:
"opcounters" : {
"insert" : <num>,
"query" : <num>,
"update" : <num>,
"delete" : <num>,
"getmore" : <num>,
"command" : <num>
},
The “asserts” document reports the number of assertions or errors produced by the server:
"asserts" : {
"regular" : <num>,
"warning" : <num>,
"msg" : <num>,
"user" : <num>,
"rollovers" : <num>
},
The “writeBacksQueued” document reports the number of writebacks:
"writeBacksQueued" : <num>,
The “dur” document reports on data that reflect this mongod instance’s journaling-related operations and performance during a journal group commit interval:
"dur" : {
"commits" : <num>,
"journaledMB" : <num>,
"writeToDataFilesMB" : <num>,
"compression" : <num>,
"commitsInWriteLock" : <num>,
"earlyCommits" : <num>,
"timeMs" : {
"dt" : <num>,
"prepLogBuffer" : <num>,
"writeToJournal" : <num>,
"writeToDataFiles" : <num>,
"remapPrivateView" : <num>
}
},
The “recordStats” document reports data on MongoDB’s ability to predict page faults and yield write operations when required data isn’t in memory:
"recordStats" : {
"accessesNotInMemory" : <num>,
"pageFaultExceptionsThrown" : <num>,
"local" : {
"accessesNotInMemory" : <num>,
"pageFaultExceptionsThrown" : <num>
},
"<database>" : {
"accessesNotInMemory" : <num>,
"pageFaultExceptionsThrown" : <num>
}
},
The final ok field holds the return status for the serverStatus command:
"ok" : 1
}
The serverStatus command returns a collection of information that reflects the database’s status. These data are useful for diagnosing and assessing the performance of your MongoDB instance. This reference catalogs each datum included in the output of this command and provides context for using this data to more effectively administer your database.
See also
Much of the output of serverStatus is also displayed dynamically by mongostat. See the mongostat command for more information.
For examples of the serverStatus output, see Server Status Output Index.
The host field contains the system’s hostname. In Unix/Linux systems, this should be the same as the output of the hostname command.
The version field contains the version of MongoDB running on the current mongod or mongos instance.
The process field identifies which kind of MongoDB instance is running. Possible values are:
The value of the uptime field corresponds to the number of seconds that the mongos or mongod process has been active.
uptimeEstimate provides the uptime as calculated from MongoDB’s internal course-grained time keeping system.
New in version 2.1.2: All locks statuses first appeared in the 2.1.2 development release for the 2.2 series.
Example
The locks document contains sub-documents that provides a granular report on MongoDB database-level lock use. All values are of the NumberLong() type.
Generally, fields named:
If a document does not have any fields, it means that no locks have existed with this context since the last time the mongod started.
A field named . holds the first document in locks that contains information about the global lock as well as aggregated data regarding lock use in all databases.
The locks...timeLockedMicros document reports the amount of time in microseconds that a lock has existed in all databases in this mongod instance.
The R field reports the amount of time in microseconds that any database has held the global read lock.
The W field reports the amount of time in microseconds that any database has held the global write lock.
The r field reports the amount of time in microseconds that any database has held the local read lock.
The w field reports the amount of time in microseconds that any database has held the local write lock.
The locks...timeAcquiringMicros document reports the amount of time in microseconds that operations have spent waiting to acquire a lock in all databases in this mongod instance.
The R field reports the amount of time in microseconds that any database has spent waiting for the global read lock.
The W field reports the amount of time in microseconds that any database has spent waiting for the global write lock.
The locks.admin document contains two sub-documents that report data regarding lock use in the admin database.
The locks.admin.timeLockedMicros document reports the amount of time in microseconds that locks have existed in the context of the admin database.
The r field reports the amount of time in microseconds that the admin database has held the read lock.
The w field reports the amount of time in microseconds that the admin database has held the write lock.
The locks.admin.timeAcquiringMicros document reports on the amount of field time in microseconds that operations have spent waiting to acquire a lock for the admin database.
The r field reports the amount of time in microseconds that operations have spent waiting to acquire a read lock on the admin database.
The w field reports the amount of time in microseconds that operations have spent waiting to acquire a write lock on the admin database.
The locks.local document contains two sub-documents that report data regarding lock use in the local database. The local database contains a number of instance specific data, including the oplog for replication.
The locks.local.timeLockedMicros document reports on the amount of time in microseconds that locks have existed in the context of the local database.
The r field reports the amount of time in microseconds that the local database has held the read lock.
The w field reports the amount of time in microseconds that the local database has held the write lock.
The locks.local.timeAcquiringMicros document reports on the amount of time in microseconds that operations have spent waiting to acquire a lock for the local database.
The r field reports the amount of time in microseconds that operations have spent waiting to acquire a read lock on the local database.
The w field reports the amount of time in microseconds that operations have spent waiting to acquire a write lock on the local database.
For each additional database locks includes a document that reports on the lock use for this database. The names of these documents reflect the database name itself.
The locks.<database>.timeLockedMicros document reports on the amount of time in microseconds that locks have existed in the context of the <database> database.
The r field reports the amount of time in microseconds that the <database> database has held the read lock.
The w field reports the amount of time in microseconds that the <database> database has held the write lock.
The locks.<database>.timeAcquiringMicros document reports on the amount of time in microseconds that operations have spent waiting to acquire a lock for the <database> database.
The r field reports the amount of time in microseconds that operations have spent waiting to acquire a read lock on the <database> database.
The w field reports the amount of time in microseconds that operations have spent waiting to acquire a write lock on the <database> database.
Example
The globalLock data structure contains information regarding the database’s current lock state, historical lock status, current operation queue, and the number of active clients.
The value of globalLock.totalTime represents the time, in microseconds, since the database last started and creation of the globalLock. This is roughly equivalent to total server uptime.
The value of globalLock.lockTime represents the time, in microseconds, since the database last started, that the globalLock has been held.
Consider this value in combination with the value of globalLock.totalTime. MongoDB aggregates these values in the globalLock.ratio value. If the globalLock.ratio value is small but globalLock.totalTime is high the globalLock has typically been held frequently for shorter periods of time, which may be indicative of a more normal use pattern. If the globalLock.lockTime is higher and the globalLock.totalTime is smaller (relatively,) then fewer operations are responsible for a greater portion of server’s use (relatively.)
Changed in version 2.2: globalLock.ratio was removed. See locks.
The value of globalLock.ratio displays the relationship between globalLock.lockTime and globalLock.totalTime.
Low values indicate that operations have held the globalLock frequently for shorter periods of time. High values indicate that operations have held globalLock infrequently for longer periods of time.
The globalLock.currentQueue data structure value provides more granular information concerning the number of operations queued because of a lock.
The value of globalLock.currentQueue.total provides a combined total of operations queued waiting for the lock.
A consistently small queue, particularly of shorter operations should cause no concern. Also, consider this value in light of the size of queue waiting for the read lock (e.g. globalLock.currentQueue.readers) and write-lock (e.g. globalLock.currentQueue.writers) individually.
The value of globalLock.currentQueue.readers is the number of operations that are currently queued and waiting for the read-lock. A consistently small read-queue, particularly of shorter operations should cause no concern.
The value of globalLock.currentQueue.writers is the number of operations that are currently queued and waiting for the write-lock. A consistently small write-queue, particularly of shorter operations is no cause for concern.
The globalLock.activeClients data structure provides more granular information about the number of connected clients and the operation types (e.g. read or write) performed by these clients.
Use this data to provide context for the currentQueue data.
The value of globalLock.activeClients.total is the total number of active client connections to the database. This combines clients that are performing read operations (e.g. globalLock.activeClients.readers) and clients that are performing write operations (e.g. globalLock.activeClients.writers).
The value of globalLock.activeClients.readers contains a count of the active client connections performing read operations.
The value of globalLock.activeClients.writers contains a count of active client connections performing write operations.
Example
The mem data structure holds information regarding the target system architecture of mongod and current memory use.
The value of mem.bits is either 64 or 32, depending on which target architecture specified during the mongod compilation process. In most instances this is 64, and this value does not change over time.
The value of mem.resident is roughly equivalent to the amount of RAM, in bytes, currently used by the database process. In normal use this value tends to grow. In dedicated database servers this number tends to approach the total amount of system memory.
mem.virtual displays the quantity, in megabytes (MB), of virtual memory used by the mongod process. In typical deployments this value is slightly larger than mem.mapped. If this value is significantly (i.e. gigabytes) larger than mem.mapped, this could indicate a memory leak.
With journaling enabled, the value of mem.virtual is twice the value of mem.mapped.
mem.supported is true when the underlying system supports extended memory information. If this value is false and the system does not support extended memory information, then other mem values may not be accessible to the database server.
The value of mem.mapped provides the amount of mapped memory, in megabytes (MB), by the database. Because MongoDB uses memory-mapped files, this value is likely to be to be roughly equivalent to the total size of your database or databases.
mem.mappedWithJournal provides the amount of mapped memory, in megabytes (MB), including the memory used for journaling. This value will always be twice the value of mem.mapped. This field is only included if journaling is enabled.
Example
The connections sub document data regarding the current connection status and availability of the database server. Use these values to asses the current load and capacity requirements of the server.
The value of connections.current corresponds to the number of connections to the database server from clients. This number includes the current shell session. Consider the value of connections.available to add more context to this datum.
This figure will include the current shell connection as well as any inter-node connections to support a replica set or sharded cluster.
connections.available provides a count of the number of unused available connections that the database can provide. Consider this value in combination with the value of connections.current to understand the connection load on the database, and the Linux ulimit Settings document for more information about system thresholds on available connections.
Example
The extra_info data structure holds data collected by the mongod instance about the underlying system. Your system may only report a subset of these fields.
The field extra_info.note reports that the data in this structure depend on the underlying platform, and has the text: “fields vary by platform.”
The extra_info.heap_usage_bytes field is only available on Unix/Linux systems, and reports the total size in bytes of heap space used by the database process.
The extra_info.page_faults field is only available on Unix/Linux systems, and reports the total number of page faults that require disk operations. Page faults refer to operations that require the database server to access data which isn’t available in active memory. The page_fault counter may increase dramatically during moments of poor performance and may correlate with limited memory environments and larger data sets. Limited and sporadic page faults do not necessarily indicate an issue.
Example
Changed in version 2.2: Previously, data in the indexCounters document reported sampled data, and were only useful in relative comparison to each other, because they could not reflect absolute index use. In 2.2 and later, these data reflect actual index use.
The indexCounters data structure reports information regarding the state and use of indexes in MongoDB.
The indexCounters.btree data structure contains data regarding MongoDB’s btree indexes.
indexCounters.btree.accesses reports the number of times that operations have accessed indexes. This value is the combination of the indexCounters.btree.hits and indexCounters.btree.misses. Higher values indicate that your database has indexes and that queries are taking advantage of these indexes. If this number does not grow over time, this might indicate that your indexes do not effectively support your use.
The indexCounters.btree.hits value reflects the number of times that an index has been accessed and mongod is able to return the index from memory.
A higher value indicates effective index use. indexCounters.btree.hits values that represent a greater proportion of the indexCounters.btree.accesses value, tend to indicate more effective index configuration.
The indexCounters.btree.misses value represents the number of times that an operation attempted to access an index that was not in memory. These “misses,” do not indicate a failed query or operation, but rather an inefficient use of the index. Lower values in this field indicate better index use and likely overall performance as well.
The indexCounters.btree.resets value reflects the number of times that the index counters have been reset since the database last restarted. Typically this value is 0, but use this value to provide context for the data specified by other indexCounters values.
The indexCounters.btree.missRatio value is the ratio of indexCounters.btree.hits to indexCounters.btree.misses misses. This value is typically 0 or approaching 0.
mongod periodically flushes writes to disk. In the default configuration, this happens every 60 seconds. The backgroundFlushing data structure contains data regarding these operations. Consider these values if you have concerns about write performance and journaling.
backgroundFlushing.flushes is a counter that collects the number of times the database has flushed all writes to disk. This value will grow as database runs for longer periods of time.
The backgroundFlushing.total_ms value provides the total number of milliseconds (ms) that the mongod processes have spent writing (i.e. flushing) data to disk. Because this is an absolute value, consider the value of backgroundFlushing.flushes and backgroundFlushing.average_ms to provide better context for this datum.
The backgroundFlushing.average_ms value describes the relationship between the number of flushes and the total amount of time that the database has spent writing data to disk. The larger backgroundFlushing.flushes is, the more likely this value is likely to represent a “normal,” time; however, abnormal data can skew this value.
Use the backgroundFlushing.last_ms to ensure that a high average is not skewed by transient historical issue or a random write distribution.
The value of the backgroundFlushing.last_ms field is the amount of time, in milliseconds, that the last flush operation took to complete. Use this value to verify that the current performance of the server and is in line with the historical data provided by backgroundFlushing.average_ms and backgroundFlushing.total_ms.
The backgroundFlushing.last_finished field provides a timestamp of the last completed flush operation in the ISODate format. If this value is more than a few minutes old relative to your server’s current time and accounting for differences in time zone, restarting the database may result in some data loss.
Also consider ongoing operations that might skew this value by routinely block write operations.
Example
output of the cursors fields.
The cursors data structure contains data regarding cursor state and use.
cursors.totalOpen provides the number of cursors that MongoDB is maintaining for clients. Because MongoDB exhausts unused cursors, typically this value small or zero. However, if there is a queue, stale tailable cursors, or a large number of operations this value may rise.
Deprecated since version 1.x: See cursors.totalOpen for this datum.
cursors.timedOut provides a counter of the total number of cursors that have timed out since the server process started. If this number is large or growing at a regular rate, this may indicate an application error.
Example
The network data structure contains data regarding MongoDB’s network use.
The value of the network.bytesIn field reflects the amount of network traffic, in bytes, received by this database. Use this value to ensure that network traffic sent to the mongod process is consistent with expectations and overall inter-application traffic.
The value of the network.bytesOut field reflects the amount of network traffic, in bytes, sent from this database. Use this value to ensure that network traffic sent by the mongod process is consistent with expectations and overall inter-application traffic.
The network.numRequests field is a counter of the total number of distinct requests that the server has received. Use this value to provide context for the network.bytesIn and network.bytesOut values to ensure that MongoDB’s network utilization is consistent with expectations and application use.
Example
The repl data structure contains status information for MongoDB’s replication (i.e. “replica set”) configuration. These values only appear when the current host has replication enabled.
See Replication Fundamentals for more information on replication.
The repl.setName field contains a string with the name of the current replica set. This value reflects the --replSet command line argument, or replSet value in the configuration file.
See Replication Fundamentals for more information on replication.
The value of the repl.ismaster field is either true or false and reflects whether the current node is the master or primary node in the replica set.
See Replication Fundamentals for more information on replication.
The value of the repl.secondary field is either true or false and reflects whether the current node is a secondary node in the replica set.
See Replication Fundamentals for more information on replication.
repl.hosts is an array that lists the other nodes in the current replica set. Each member of the replica set appears in the form of hostname:port.
See Replication Fundamentals for more information on replication.
Example
The opcountersRepl data structure, similar to the opcounters data structure, provides an overview of database replication operations by type and makes it possible to analyze the load on the replica in more granular manner. These values only appear when the current host has replication enabled.
These values will differ from the opcounters values because of how MongoDB serializes operations during replication. See Replication Fundamentals for more information on replication.
These numbers will grow over time in response to database use. Analyze these values over time to track database utilization.
opcountersRepl.insert provides a counter of the total number of replicated insert operations since the mongod instance last started.
opcountersRepl.query provides a counter of the total number of replicated queries since the mongod instance last started.
opcountersRepl.update provides a counter of the total number of replicated update operations since the mongod instance last started.
opcountersRepl.delete provides a counter of the total number of replicated delete operations since the mongod instance last started.
opcountersRepl.getmore provides a counter of the total number of “getmore” operations since the mongod instance last started. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process.
opcountersRepl.command provides a counter of the total number of replicated commands issued to the database since the mongod instance last started.
New in version 2.1.2.
The replNetworkQueue document reports on the network replication buffer, which permits replication operations to happen in the background. This feature is internal.
This document only appears on secondary members of replica sets.
replNetworkQueue.waitTimeMs reports the amount of time that a secondary waits to add operations to network queue. This value is cumulative.
replNetworkQueue.numElems reports the number of operations stored in the queue.
replNetworkQueue.numBytes reports the total size of the network replication queue.
Example
The opcounters data structure provides an overview of database operations by type and makes it possible to analyze the load on the database in more granular manner.
These numbers will grow over time and in response to database use. Analyze these values over time to track database utilization.
opcounters.insert provides a counter of the total number of insert operations since the mongod instance last started.
opcounters.query provides a counter of the total number of queries since the mongod instance last started.
opcounters.update provides a counter of the total number of update operations since the mongod instance last started.
opcounters.delete provides a counter of the total number of delete operations since the mongod instance last started.
opcounters.getmore provides a counter of the total number of “getmore” operations since the mongod instance last started. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process.
opcounters.command provides a counter of the total number of commands issued to the database since the mongod instance last started.
Example
The asserts document reports the number of asserts on the database. While assert errors are typically uncommon, if there are non-zero values for the asserts, you should check the log file for the mongod process for more information. In many cases these errors are trivial, but are worth investigating.
The asserts.regular counter tracks the number of regular assertions raised since the server process started. Check the log file for more information about these messages.
The asserts.warning counter tracks the number of warnings raised since the server process started. Check the log file for more information about these warnings.
The asserts.msg counter tracks the number of message assertions raised since the server process started. Check the log file for more information about these messages.
The asserts.user counter reports the number of “user asserts” that have occurred since the last time the server process started. These are errors that user may generate, such as out of disk space or duplicate key. You can prevent these assertions by fixing a problem with your application or deployment. Check the MongoDB log for more information.
The asserts.rollovers counter displays the number of times that the rollover counters have rolled over since the last time the server process started. The counters will rollover to zero after 230 assertions. Use this value to provide context to the other values in the asserts data structure.
The value of writeBacksQueued is true when there are operations from a mongos instance queued for retrying. Typically this option is false.
See also
New in version 1.8.
Example
The dur (for “durability”) document contains data regarding the mongod‘s journaling-related operations and performance. mongod must be running with journaling for these data to appear in the output of “serverStatus”.
Note
The data values are not cumulative but are reset on a regular basis as determined by the journal group commit interval. This interval is ~100 milliseconds (ms) by default (or 30ms if the journal file is on the same file system as your data files) and is cut by 1/3 when there is a getLastError command pending. The interval is configurable using the --journalCommitInterval option.
See also
“Journaling” for more information about journaling operations.
The dur.commits provides the number of transactions written to the journal during the last journal group commit interval.
The dur.journaledMB provides the amount of data in megabytes (MB) written to journal during the last journal group commit interval.
The dur.writeToDataFilesMB provides the amount of data in megabytes (MB) written from journal to the data files during the last journal group commit interval.
New in version 2.0.
The dur.compression represents the compression ratio of the data written to the journal:
( journaled_size_of_data / uncompressed_size_of_data )
The dur.commitsInWriteLock provides a count of the commits that occurred while a write lock was held. Commits in a write lock indicate a MongoDB node under a heavy write load and call for further diagnosis.
The dur.earlyCommits value reflects the number of times MongoDB requested a commit before the scheduled journal group commit interval. Use this value to ensure that your journal group commit interval is not too long for your deployment.
The dur.timeMS document provides information about the performance of the mongod instance during the various phases of journaling in the last journal group commit interval.
The dur.timeMS.dt value provides, in milliseconds, the amount of time over which MongoDB collected the dur.timeMS data. Use this field to provide context to the other dur.timeMS field values.
The dur.timeMS.prepLogBuffer value provides, in milliseconds, the amount of time spent preparing to write to the journal. Smaller values indicate better journal performance.
The dur.timeMS.writeToJournal value provides, in milliseconds, the amount of time spent actually writing to the journal. File system speeds and device interfaces can affect performance.
The dur.timeMS.writeToDataFiles value provides, in milliseconds, the amount of time spent writing to data files after journaling. File system speeds and device interfaces can affect performance.
The dur.timeMS.remapPrivateView value provides, in milliseconds, the amount of time spent remapping copy-on-write memory mapped views. Smaller values indicate better journal performance.
Example
output of the recordStats fields.
The recordStats document provides fine grained reporting on page faults on a per database level.
recordStats.accessesNotInMemory reflects the number of times mongod needed to access a memory page that was not resident in memory for all databases managed by this mongod instance.
recordStats.pageFaultExceptionsThrown reflects the number of page fault exceptions thrown by mongod when accessing data for all databases managed by this mongod instance.
recordStats.local.accessesNotInMemory reflects the number of times mongod needed to access a memory page that was not resident in memory for the local database.
recordStats.local.pageFaultExceptionsThrown reflects the number of page fault exceptions thrown by mongod when accessing data for the local database.
recordStats.admin.accessesNotInMemory reflects the number of times mongod needed to access a memory page that was not resident in memory for the admin database.
recordStats.admin.pageFaultExceptionsThrown reflects the number of page fault exceptions thrown by mongod when accessing data for the admin database.
recordStats.<database>.accessesNotInMemory reflects the number of times mongod needed to access a memory page that was not resident in memory for the <database> database.
recordStats.<database>.pageFaultExceptionsThrown reflects the number of page fault exceptions thrown by mongod when accessing data for the <database> database.
MongoDB can report data that reflects the current state of the “active” database. In this context “database,” refers to a single MongoDB database. To run dbStats issue this command in the shell:
db.runCommand( { dbStats: 1 } )
The mongo shell provides the helper function db.stats(). Use the following form:
db.stats()
The above commands are equivalent. Without any arguments, db.stats() returns values in bytes. To convert the returned values to kilobytes, use the scale argument:
db.stats(1024)
Or:
db.runCommand( { dbStats: 1, scale: 1024 } )
Note
Because scaling rounds values to whole number, scaling may return unlikely or unexpected results.
The above commands are equivalent. See the dbStats database command and the db.stats() helper for the mongo shell for additional information.
Contains the name of the database.
Contains a count of the number of collections in that database.
Contains a count of the number of objects (i.e. documents) in the database across all collections.
The average size of each object. The scale argument affects this value. This is the dataSize divided by the number of objects.
The total size of the data held in this database including the padding factor. The scale argument affects this value. The dataSize will not decrease when documents shrink, but will decrease when you remove documents.
The total amount of space allocated to collections in this database for document storage. The scale argument affects this value. The storageSize does not decrease as you remove or shrink documents.
Contains a count of the number of extents in the database across all collections.
Contains a count of the total number of indexes across all collections in the database.
The total size of all indexes created on this database. The scale arguments affects this value.
The total size of the data files that hold the database. This value includes preallocated space and the padding factor. The value of fileSize only reflects the size of the data files for the database and not the namespace file.
The scale argument affects this value.
The total size of the namespace files (i.e. that end with .ns) for this database. You cannot change the size of the namespace file after creating a database, but you can change the default size for all new namespace files with the nssize runtime option.
See also
The nssize option, and Maximum Namespace File Size
To fetch collection statistics, call the db.collection.stats() method on a collection object in the mongo shell:
db.collection.stats()
You may also use the literal command format:
db.runCommand( { collStats: "collection" } )
Replace collection in both examples with the name of the collection you want statistics for. By default, the return values will appear in terms of bytes. You can, however, enter a scale argument. For example, you can convert the return values to kilobytes like so:
db.collection.stats(1024)
Or:
db.runCommand( { collStats: "collection", scale: 1024 } )
Note
The scale argument rounds values to whole numbers. This can produce unpredictable and unexpected results in some situations.
See also
The documentation of the “collStats” command and the “db.collection.stats(),” method in the mongo shell.
The output of db.collection.stats() resembles the following:
{
"ns" : "<database>.<collection>",
"count" : <number>,
"size" : <number>,
"avgObjSize" : <number>,
"storageSize" : <number>,
"numExtents" : <number>,
"nindexes" : <number>,
"lastExtentSize" : <number>,
"paddingFactor" : <number>,
"systemFlags" : <bit>,
"userFlags" : <bit>,
"totalIndexSize" : <number>,
"indexSizes" : {
"_id_" : <number>,
"a_1" : <number>
},
"ok" : 1
}
The namespace of the current collection, which follows the format [database].[collection].
The number of objects or documents in this collection.
The size of the data stored in this collection. This value does not include the size of any indexes associated with the collection, which the totalIndexSize field reports.
The scale argument affects this value.
The average size of an object in the collection. The scale argument affects this value.
The total amount of storage allocated to this collection for document storage. The scale argument affects this value. The storageSize does not decrease as you remove or shrink documents.
The total number of contiguously allocated data file regions.
The number of indexes on the collection. All collections have at least one index on the _id field.
Changed in version 2.2: Before 2.2, capped collections did not necessarily have an index on the _id field, and some capped collections created with pre-2.2 versions of mongod may not have an _id index.
The size of the last extent allocated. The scale argument affects this value.
The amount of space added to the end of each document at insert time. The document padding provides a small amount of extra space on disk to allow a document to grow slightly without needing to move the document. mongod automatically calculates this padding factor
Changed in version 2.2: Removed in version 2.2 and replaced with the userFlags and systemFlags fields.
Indicates the number of flags on the current collection. In version 2.0, the only flag notes the existence of an index on the _id field.
New in version 2.2.
Reports the flags on this collection that reflect internal server options. Typically this value is 1 and reflects the existence of an index on the _id field.
New in version 2.2.
Reports the flags on this collection set by the user. In version 2.2 the only user flag is usePowerOf2Sizes. If usePowerOf2Sizes is enabled, userFlags will be set to 1, otherwise userFlags will be 0.
See the collMod command for more information on setting user flags and usePowerOf2Sizes.
The total size of all indexes. The scale argument affects this value.
This field specifies the key and size of every existing index on the collection. The scale argument affects this value.
The collection validation command checks all of the structures within a name space for correctness and returns a document containing information regarding the on-disk representation of the collection.
Warning
The validate process may consume significant system resources and impede application performance because it must scan all data in the collection.
Run the validation command in the mongo shell using the following form to validate a collection named people:
db.people.validate()
Alternatively you can use the command prototype and the db.runCommand() shell helper in the following form:
db.runCommand( { validate: "people", full: true } )
db.people.validate(true)
See also
“validate” and “validate().”
The full namespace name of the collection. Namespaces include the database name and the collection name in the form database.collection.
The disk location of the first extent in the collection. The value of this field also includes the namespace.
The disk location of the last extent in the collection. The value of this field also includes the namespace.
The number of extents in the collection.
validate returns one instance of this document for every extent in the collection. This sub-document is only returned when you specify the full option to the command.
The disk location for the beginning of this extent.
The disk location for the extent following this one. “null” if this is the end of the linked list of extents.
The disk location for the extent preceding this one. “null” if this is the head of the linked list of extents.
The namespace this extent belongs to (should be the same as the namespace shown at the beginning of the validate listing).
The number of bytes in this extent.
The disk location of the first record in this extent.
The disk location of the last record in this extent.
The number of bytes in all data records. This value does not include deleted records, nor does it include extent headers, nor record headers, nor space in a file unallocated to any extent. datasize includes record padding.
The size of the last new extent created in this collection. This value determines the size of the next extent created.
A floating point value between 1 and 2.
When MongoDB creates a new record it uses the padding factor to determine how much additional space to add to the record. The padding factor is automatically adjusted by mongo when it notices that update operations are triggering record moves.
The size of the first extent created in this collection. This data is similar to the data provided by the extents sub-document; however, the data reflects only the first extent in the collection and is always returned.
The disk location for the beginning of this extent.
The disk location for the extent following this one. “null” if this is the end of the linked list of extents, which should only be the case if there is only one extent.
The disk location for the extent preceding this one. This should always be “null.”
The namespace this extent belongs to (should be the same as the namespace shown at the beginning of the validate listing).
The number of bytes in this extent.
The disk location of the first record in this extent.
The disk location of the last record in this extent.
The number of records actually encountered in a scan of the collection. This field should have the same value as the nrecords field.
The number of records containing BSON documents that do not pass a validation check.
Note
This field is only included in the validation output when you specify the full option.
This is similar to datasize, except that bytesWithHeaders includes the record headers. In version 2.0, record headers are 16 bytes per document.
Note
This field is only included in the validation output when you specify the full option.
bytesWithoutHeaders returns data collected from a scan of all records. The value should be the same as datasize.
Note
This field is only included in the validation output when you specify the full option.
The number of deleted or “free” records in the collection.
The size of all deleted or “free” records in the collection.
The number of indexes on the data in the collection.
A document containing a field for each index, named after the index’s name, that contains the number of keys, or documents referenced, included in the index.
Boolean. true, unless validate determines that an aspect of the collection is not valid. When false, see the errors field for more information.
mongos instances maintain a pool of connections for interacting with constituent members of the sharded cluster. Additionally, mongod instances maintain connection with other shards in the cluster for migrations. The connPoolStats command returns statistics regarding these connections between the mongos and mongod instances or between the mongod instances in a shard cluster.
Note
connPoolStats only returns meaningful results for mongos instances and for mongod instances in sharded clusters.
The sub-documents of the hosts document report connections between the mongos or mongod instance and each component mongod of the sharded cluster.
hosts.[host].available reports the total number of connections that the mongos or mongod could use to connect to this mongod.
hosts.[host].created reports the number of connections that this mongos or mongod has ever created for this host.
replicaSets is a document that contains replica set information for the sharded cluster.
The replicaSets.shard document reports on each shard within the sharded cluster
The replicaSets.[shard].host field holds an array of document that reports on each host within the shard in the replica set.
These values derive from the replica set status values.
replicaSets.[shard].host[n].addr reports the address for the host in the sharded cluster in the format of “[hostname]:[port]”.
replicaSets.[shard].host[n].ok reports false when:
This field is for internal use.
replicaSets.[shard].host[n].ismaster reports true if this replicaSets.[shard].host is the primary member of the replica set.
replicaSets.[shard].host[n].hidden reports true if this replicaSets.[shard].host is a hidden member of the replica set.
replicaSets.[shard].host[n].secondary reports true if this replicaSets.[shard].host is a secondary member of the replica set.
replicaSets.[shard].host[n].pingTimeMillis reports the ping time in milliseconds from the mongos or mongod to this replicaSets.[shard].host.
New in version 2.2.
replicaSets.[shard].host[n].tags reports the members[n].tags, if this member of the set has tags configured.
replicaSets.[shard].master reports the ordinal identifier of the host in the replicaSets.[shard].host array that is the primary of the replica set.
Deprecated since version 2.2.
replicaSets.[shard].nextSlave reports the secondary member that the mongos will use to service the next request for this replica set.
createdByType document reports the number of each type of connection that mongos or mongod has created in all connection pools.
mongos connect to mongod instances using one of three types of connections. The following sub-document reports the total number of connections by type.
createdByType.master reports the total number of connections to the primary member in each cluster.
createdByType.set reports the total number of connections to a replica set member.
createdByType.sync reports the total number of config database connections.
totalAvailable reports the running total of connections from the mongos or mongod to all mongod instances in the sharded cluster available for use. This value does not reflect those connections that
totalCreated reports the total number of connections ever created from the mongos or mongod to all mongod instances in the sharded cluster.
numDBClientConnection reports the total number of connections from the mongos or mongod to all of the mongod instances in the sharded cluster.
numAScopedConnection reports the number of exception safe connections created from mongos or mongod to all mongod in the sharded cluster. The mongos or mongod releases these connections after receiving a socket exception from the mongod.
The replSetGetStatus provides an overview of the current status of a replica set. Issue the following command against the admin database, in the mongo shell:
db.runCommand( { replSetGetStatus: 1 } )
You can also use the following helper in the mongo shell to access this functionality
rs.status()
The value specified (e.g 1 above,) does not impact the output of the command. Data provided by this command derives from data included in heartbeats sent to the current instance by other members of the replica set: because of the frequency of heartbeats, these data can be several seconds out of date.
Note
The mongod must have replication enabled and be a member of a replica set for the for replSetGetStatus to return successfully.
See also
“rs.status()” shell helper function, “Replication”.
The set value is the name of the replica set, configured in the replSet setting. This is the same value as _id in rs.conf().
The value of the date field is an ISODate of the current time, according to the current server. Compare this to the value of the members.lastHeartbeat to find the operational lag between the current host and the other hosts in the set.
The value of myState reflects state of the current replica set member. An integer between 0 and 10 represents the state of the member. These integers map to states, as described in the following table:
| Number | State |
| 0 | Starting up, phase 1 (parsing configuration) |
| 1 | Primary |
| 2 | Secondary |
| 3 | Recovering (initial syncing, post-rollback, stale members) |
| 4 | Fatal error |
| 5 | Starting up, phase 2 (forking threads) |
| 6 | Unknown state (the set has never connected to the member) |
| 7 | Arbiter |
| 8 | Down |
| 9 | Rollback |
| 10 | Removed |
The members field holds an array that contains a document for every member in the replica set. See the “Member Statuses” for an overview of the values included in these documents.
The syncingTo field is only present on the output of rs.status() on secondary and recovering members, and holds the hostname of the member from which this instance is syncing.
The name field holds the name of the server.
The self field is only included in the document for the current mongod instance in the members array. It’s value is true.
This field contains the most recent error or status message received from the member. This field may be empty (e.g. "") in some cases.
The health value is only present for the other members of the replica set (i.e. not the member that returns rs.status.) This field conveys if the member is up (i.e. 1) or down (i.e. 0.)
The value of the members.state reflects state of this replica set member. An integer between 0 and 10 represents the state of the member. These integers map to states, as described in the following table:
| Number | State |
| 0 | Starting up, phase 1 (parsing configuration) |
| 1 | Primary |
| 2 | Secondary |
| 3 | Recovering (initial syncing, post-rollback, stale members) |
| 4 | Fatal error |
| 5 | Starting up, phase 2 (forking threads) |
| 6 | Unknown state (the set has never connected to the member) |
| 7 | Arbiter |
| 8 | Down |
| 9 | Rollback |
| 10 | Removed |
A string that describes members.state.
The members.uptime field holds a value that reflects the number of seconds that this member has been online.
This value does not appear for the member that returns the rs.status() data.
A document that contains information regarding the last operation from the operation log that this member has applied.
A 32-bit timestamp of the last operation applied to this member of the replica set from the oplog.
An incremented field, which reflects the number of operations in since the last time stamp. This value only increases if there is more than one operation per second.
An ISODate formatted date string that reflects the last entry from the oplog that this member applied. If this differs significantly from members.lastHeartbeat this member is either experiencing “replication lag” or there have not been any new operations since the last update. Compare members.optimeDate between all of the members of the set.
The lastHeartbeat value provides an ISODate formatted date of the last heartbeat received from this member. Compare this value to the value of the date field to track latency between these members.
This value does not appear for the member that returns the rs.status() data.
The pingMS represents the number of milliseconds (ms) that a round-trip packet takes to travel between the remote member and the local instance.
This value does not appear for the member that returns the rs.status() data.
This reference provides an overview of all possible replica set configuration options and settings.
Use rs.conf() in the mongo shell to retrieve this configuration. Note that default values are not explicitly displayed.
Type: string
Value: <setname>
An _id field holding the name of the replica set. This reflects the set name configured with replSet or mongod --replSet.
Type: array
Contains an array holding an embedded document for each member of the replica set. The members document contains a number of fields that describe the configuration of each member of the replica set.
The members field in the replica set configuration document is a zero-indexed array.
Type: ordinal
Provides the zero-indexed identifier of every member in the replica set.
Note
When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id field in each document in the members array.
The _id rarely corresponds to the array index.
Type: <hostname>:<port>
Identifies the host name of the set member with a hostname and port number. This name must be resolvable for every host in the replica set.
Warning
members[n].host cannot hold a value that resolves to localhost or the local interface unless all members of the set are on hosts that resolve to localhost.
Optional.
Type: boolean
Default: false
Identifies an arbiter. For arbiters, this value is true, and is automatically configured by rs.addArb()”.
Optional.
Type: boolean
Default: true
Determines whether the mongod builds indexes on this member. Do not set to false if a replica set can become a master, or if any clients ever issue queries against this instance.
Omitting index creation, and thus this setting, may be useful, if:
If set to false, secondaries configured with this option do build indexes on the _id field, to facilitate operations required for replication.
Warning
You may only set this value when adding a member to a replica set. You may not reconfigure a replica set to change the value of the members[n].buildIndexes field after adding the member to the set.
Furthermore, other secondaries cannot synchronize off of replica set members where members[n].buildIndexes is false.
Optional.
Type: boolean
Default: false
When this value is true, the replica set hides this instance, and does not include the member in the output of db.isMaster() or isMaster. This prevents read operations (i.e. queries) from ever reaching this host by way of secondary read preference.
See also
Optional.
Type: Number, between 0 and 100.0 including decimals.
Default: 1
Specify higher values to make a member more eligible to become primary, and lower values to make the member less eligible to become primary. Priorities are only used in comparison to each other, members of the set will veto elections from members when another eligible member has a higher absolute priority value. Changing the balance of priority in a replica set will cause an election.
A members[n].priority of 0 makes it impossible for a member to become primary.
See also
Optional.
Type: MongoDB Document
Default: none
Used to represent arbitrary values for describing or tagging members for the purposes of extending write concern to allow configurable data center awareness.
Use in conjunction with settings.getLastErrorModes and settings.getLastErrorDefaults and db.getLastError() (i.e. getLastError.)
Optional.
Type: Integer. (seconds.)
Default: 0
Describes the number of seconds “behind” the master that this replica set member should “lag.” Use this option to create delayed members, that maintain a copy of the data that reflects the state of the data set some amount of time (specified in seconds.) Typically these members help protect against human error, and provide some measure of insurance against the unforeseen consequences of changes and updates.
Optional.
Type: Integer
Default: 1
Controls the number of votes a server has in a replica set election. The number of votes each member has can be any non-negative integer, but it is highly recommended each member has 1 or 0 votes.
If you need more than 7 members, use this setting to add additional non-voting members with a members[n].votes value of 0.
For most deployments and most members, use the default value, 1, for members[n].votes.
Optional.
Type: MongoDB Document
The setting document holds two optional fields, which affect the available write concern options and default configurations.
Optional.
Type: MongoDB Document
Specify arguments to the getLastError that members of this replica set will use when no arguments to getLastError has no arguments. If you specify any arguments, getLastError , ignores these defaults.
Optional.
Type: MongoDB Document
Defines the names and combination of tags for use by the application layer to guarantee write concern to database using the getLastError command to provide data-center awareness.
The following document provides a representation of a replica set configuration document. Angle brackets (e.g. < and >) enclose all optional fields.
{
_id : <setname>,
version: <int>,
members: [
{
_id : <ordinal>,
host : hostname<:port>,
<arbiterOnly : <boolean>,>
<buildIndexes : <boolean>,>
<hidden : <boolean>,>
<priority: <priority>,>
<tags: { <document> },>
<slaveDelay : <number>,>
<votes : <number>>
}
, ...
],
<settings: {
<getLastEerrorDefaults : <lasterrdefaults>,>
<getLastErrorModes : <modes>>
}>
}
Most modifications of replica set configuration use the mongo shell. Consider the following reconfiguration operation:
Example
Given the following replica set configuration:
{
"_id" : "rs0",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:27017"
},
{
"_id" : 1,
"host" : "mongodb1.example.net:27017"
},
{
"_id" : 2,
"host" : "mongodb2.example.net:27017"
}
]
}
And the following reconfiguration operation:
cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 2
cfg.members[2].priority = 2
rs.reconfig(cfg)
This operation begins by saving the current replica set configuration to the local variable cfg using the rs.conf() method. Then it adds priority values to the cfg document where for the first three sub-documents in the members array. Finally, it calls the rs.reconfig() method with the argument of cfg to initialize this new configuration. The replica set configuration after this operation will resemble the following:
{
"_id" : "rs0",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:27017",
"priority" : 0.5
},
{
"_id" : 1,
"host" : "mongodb1.example.net:27017",
"priority" : 2
},
{
"_id" : 2,
"host" : "mongodb2.example.net:27017",
"priority" : 1
}
]
}
Using the “dot notation” demonstrated in the above example, you can modify any existing setting or specify any of optional replica set configuration variables. Until you run rs.reconfig(cfg) at the shell, no changes will take effect. You can issue cfg = rs.conf() at any time before using rs.reconfig() to undo your changes and start from the current configuration. If you issue cfg as an operation at any point, the mongo shell at any point will output the complete document with modifications for your review.
The rs.reconfig() operation has a “force” option, to make it possible to reconfigure a replica set if a majority of the replica set is not visible, and there is no primary member of the set. use the following form:
rs.reconfig(cfg, { force: true } )
Warning
Forcing a rs.reconfig() can lead to rollback situations and other difficult to recover from situations. Exercise caution when using this option.
Note
The rs.reconfig() shell method can force the current primary to step down and causes an election in some situations. When the primary steps down, all clients will disconnect. This is by design. While this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods.
Tag sets provide custom and configurable write concern and read preferences for a replica set. This section will outline the process for specifying tags for a replica set, for more information see the full documentation of the behavior of tags sets write concern and tag sets for read preference.
Configure tag sets by adding fields and values to the document stored in the members[n].tags. Consider the following example:
Example
Given the following replica set configuration:
{
"_id" : "rs0",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:27017"
},
{
"_id" : 1,
"host" : "mongodb1.example.net:27017"
},
{
"_id" : 2,
"host" : "mongodb2.example.net:27017"
}
]
}
You could add the tag sets, to the members of this replica set, with the following command sequence in the mongo shell:
conf = rs.conf()
conf.members[0].tags = { "dc": "east", "use": "production" }
conf.members[1].tags = { "dc": "east", "use": "reporting" }
conf.members[2].tags = { "use": "production" }
rs.reconfig(conf)
After this operation the output of rs.conf(), would resemble the following:
{
"_id" : "rs0",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:27017",
"tags" : {
"dc": "east",
"use": "production"
}
},
{
"_id" : 1,
"host" : "mongodb1.example.net:27017",
"tags" : {
"dc": "east",
"use": "reporting"
}
},
{
"_id" : 2,
"host" : "mongodb2.example.net:27017",
"tags" : {
"use": "production"
}
}
]
}
The db.getReplicationInfo() provides current status of the current replica status, using data polled from the “oplog”. Consider the values of this output when diagnosing issues with replication.
See also
“Replication Fundamentals” for more information on replication.
The following fields are present in the output of db.getReplicationInfo() for both primary and secondary nodes.
The following fields appear in the output of db.getReplicationInfo() for primary nodes.
Returns the last error status.
The following fields appear in the output of db.getReplicationInfo() for secondary nodes.
Returns the difference between the first and last operation in the oplog, represented in seconds.
Returns the difference between the first and last operation in the oplog, rounded and represented in hours.
Returns a time stamp for the first (i.e. earliest) operation in the oplog. Compare this value to the last write operation issued against the server.
Changed in version 2.2.
The db.currentOp() helper in the mongo shell reports on the current operations running on the mongod instance. The operation returns the inprog array, which contains a document for each in progress operation. Consider the following example output:
{
"inprog": [
{
"opid" : 3434473,
"active" : <boolean>,
"secs_running" : 0,
"op" : "<operation>",
"ns" : "<database>.<collection>",
"query" : {
},
"client" : "<host>:<outgoing>",
"desc" : "conn57683",
"threadId" : "0x7f04a637b700",
"connectionId" : 57683,
"locks" : {
"^" : "w",
"^local" : "W",
"^<database>" : "W"
},
"waitingForLock" : false,
"msg": "<string>"
"numYields" : 0,
"progress" : {
"done" : <number>,
"total" : <number>
}
"lockStats" : {
"timeLockedMicros" : {
"R" : NumberLong(),
"W" : NumberLong(),
"r" : NumberLong(),
"w" : NumberLong()
},
"timeAcquiringMicros" : {
"R" : NumberLong(),
"W" : NumberLong(),
"r" : NumberLong(),
"w" : NumberLong()
}
}
},
]
}
Optional
You may specify the true argument to db.currentOp() to return a more verbose output including idle connections and system operations. For example:
db.currentOp(true)
Furthermore, active operations (i.e. where active is true) will return additional fields.
You can use the db.killOp() in conjunction with the opid field to terminate a currently running operation. Consider the following JavaScript operations for the mongo shell that you can use to filter the output of identify specific types of operations:
Return all pending write operations:
db.currentOp().inprog.forEach(
function(d){
if(d.waitingForLock && d.lockType != "read")
printjson(d)
})
Return the active write operation:
db.currentOp().inprog.forEach(
function(d){
if(d.active && d.lockType == "write")
printjson(d)
})
Return all active read operations:
db.currentOp().inprog.forEach(
function(d){
if(d.active && d.lockType == "read")
printjson(d)
})
Holds an identifier for the operation. You can pass this value to db.killOp() in the mongo shell to terminate the operation.
A boolean value, that is true if the operation is currently running or false if the operation is queued and waiting for a lock to run.
The duration of the operation in seconds. MongoDB calculates this value by subtracting the current time from the start time of the operation.
A string that identifies the type of operation. The possible values are:
The namespace the operation targets. MongoDB forms namespaces using the name of the database and the name of the collection.
A document containing the current operation’s query. The document is empty for operations that do not have queries: getmore, insert, and command.
The IP address (or hostname) and the ephemeral port of the client connection where the operation originates. If your inprog array has operations from many different clients, use this string to relate operations to clients.
For some commands, including findAndModify and db.eval(), the client will be 0.0.0.0:0, rather than an actual client.
A description of the client. This string includes the connectionId.
An identifier for the thread that services the operation and its connection.
An identifier for the connection where the operation originated.
New in version 2.2.
The locks document reports on the kinds of locks the operation currently holds. The following kinds of locks are possible:
locks.^ reports on the global lock state for the mongod instance. The operation must hold this for some global phases of an operation.
locks.^ reports on the lock for the local database. MongoDB uses the local database for a number of operations, but the most frequent use of the local database is for the oplog used in replication.
locks.^ reports on the lock state for the database that this operation targets.
locks replaces lockType in earlier versions.
Changed in version 2.2: The locks replaced the lockType field in 2.2.
Identifies the type of lock the operation currently holds. The possible values are:
Returns a boolean value. waitingForLock is true if the operation is waiting for a lock and false if the operation has the required lock.
The msg provides a message that describes the status and progress of the operation. In the case of indexing or mapReduce operations, the field reports the completion percentage.
Reports on the progress of mapReduce or indexing operations. The progress fields corresponds to the completion percentage in the msg field. The progress specifies the following information:
Reports the number completed.
Reports the total number.
Returns true if mongod instance is in the process of killing the operation.
numYields is a counter that reports the number of times the operation has yielded to allow other operations to complete.
Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete quickly while MongoDB reads in data for the yielding operation.
New in version 2.2.
The lockStats document reflects the amount of time the operation has spent both acquiring and holding locks. lockStats reports data on a per-lock type, with the following possible lock types:
- timeLockedMicros¶
The timeLockedMicros document reports the amount of time the operation has spent holding a specific lock.
- timeLockedMicros.R¶
Reports the amount of time in microseconds the operation has held the global read lock.
- timeLockedMicros.W¶
Reports the amount of time in microseconds the operation has held the global write lock.
- timeLockedMicros.r¶
Reports the amount of time in microseconds the operation has held the database specific read lock.
- timeLockedMicros.w¶
Reports the amount of time in microseconds the operation has held the database specific write lock.
The timeAcquiringMicros document reports the amount of time the operation has spent waiting to acquire a specific lock.
Reports the mount of time in microseconds the operation has waited for the global read lock.
Reports the mount of time in microseconds the operation has waited for the global write lock.
Reports the mount of time in microseconds the operation has waited for the database specific read lock.
Reports the mount of time in microseconds the operation has waited for the database specific write lock.
The database profiler captures data information about read and write operations, cursor operations, and database commands. To configure the database profile and set the thresholds for capturing profile data, see the Analyze Performance of Database Operations section.
The database profiler writes data in the system.profile collection, which is a capped collection. To view the profiler’s output, use normal MongoDB queries on the system.profile collection.
Note
Because the database profiler writes data to the system.profile collection in a database, the profiler will profile some write activity, even for databases that are otherwise read-only.
The documents in the system.profile collection have the following form. This example document reflects an update operation:
{
"ts" : ISODate("2012-12-10T19:31:28.977Z"),
"op" : "update",
"ns" : "social.users",
"query" : {
"name" : "jane"
},
"updateobj" : {
"$set" : {
"likes" : [
"basketball",
"trekking"
]
}
},
"nscanned" : 8,
"moved" : true,
"nmoved" : 1,
"nupdated" : 1,
"keyUpdates" : 0,
"numYield" : 0,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(0),
"w" : NumberLong(258)
},
"timeAcquiringMicros" : {
"r" : NumberLong(0),
"w" : NumberLong(7)
}
},
"millis" : 0,
"client" : "127.0.0.1",
"user" : ""
}
The database profiler reports the following values. The set of values reported for a given operation depends on the operation:
The timestamp of the operation.
The type of operation. The possible values are:
The namespace the operation targets. Namespaces in MongoDB take the form of the database, followed by a dot (.), followed by the name of the collection.
The query document used. See Query Specification Documents for more information on these documents, and Meta Query Operators for more information.
The command operation.
The update document passed in during an update operation.
The ID of the cursor accessed by a getmore operation.
The number of documents the operation specified to return. For example, the profile command would return one document (a results document) so the ntoreturn value would be 1. The limit(5) command would return five documents so the ntoreturn value would be 5.
If the ntoreturn value is 0, the command did not specify a number of documents to return, as would be the case with a simple find() command with no limit specified.
The number of documents that MongoDB scans in the index in order to carry out the operation.
In general, if nscanned is much higher than nreturned, the database is scanning many objects to find the target objects. Consider creating an index to improve this.
If moved has a value of true indicates that the update operation moved one or more documents to a new location on disk. These operations take more time than in-place updates, and typically occur when documents grow as a result of document growth.
The number of documents moved on disk by the operation.
The number of documents updated by the operation.
The number of index keys the update changed in the operation. Changing an index key carries a small performance cost because the database must remove the old key and inserts a new key into the B-tree index.
The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation. For more information, see the FAQ on when operations yield.
The time in microseconds the operation spent acquiring and holding locks. This field reports data for the following lock types:
The time in microseconds the operation held a specific lock.
The time in microseconds the operation spent waiting to acquire a specific lock.
The number of documents returned by the operation.
The length in bytes of the operation’s result document. A large responseLength can affect performance. To limit the size of a the result document for a query operation, you can use any of the following:
The time in milliseconds for the server to perform the operation. This time does not include network time nor time to acquire the lock.
The IP address or hostname of the client connection where the operation originates.
For some operations, such as db.eval(), the client is 0.0.0.0:0 instead of an actual client.
The authenticated user who ran the operation.
This document explains the output of the $explain operator and the mongo shell method explain().
The Core Explain Output fields display information for queries on non-sharded collections. For queries on sharded collections, explain returns this information for each shard the query accesses.
{
"cursor" : "<Cursor Type and Index>",
"isMultiKey" : <boolean>,
"n" : <num>,
"nscannedObjects" : <num>,
"nscanned" : <num>,
"nscannedObjectsAllPlans" : <num>,
"nscannedAllPlans" : <num>,
"scanAndOrder" : <boolean>,
"indexOnly" : <boolean>,
"nYields" : <num>,
"nChunkSkips" : <num>,
"millis" : <num>,
"indexBounds" : { <index bounds> },
"allPlans" : [
{ "cursor" : "<Cursor Type and Index>",
"n" : <num>,
"nscannedObjects" : <num>,
"nscanned" : <num>,
"indexBounds" : { <index bounds> }
},
...
],
"oldPlan" : {
"cursor" : "<Cursor Type and Index>",
"indexBounds" : { <index bounds> }
}
"server" : "<host:port>",
}
Queries with $or operator execute each clause of the $or expression in parallel and can use separate indexes on the individual clauses. If the query uses indexes on any or all of the query’s clause, explain() contains output for each clause as well as the cumulative data for the entire query:
{
"clauses" : [
{
<core explain output>
},
{
<core explain output>
},
...
],
"n" : <num>,
"nscannedObjects" : <num>,
"nscanned" : <num>,
"nscannedObjectsAllPlans" : <num>,
"nscannedAllPlans" : <num>,
"millis" : <num>,
"server" : "<host:port>"
}
For queries on a sharded collection, the output contains the Core Explain Output for each accessed shard and cumulative shard information:
{
"clusteredType" : "<Shard Access Type>",
"shards" : {
"<shard1>" : [
{
<core explain output>
}
],
"<shard2>" : [
{
<core explain output>
}
],
...
},
"millisShardTotal" : <num>,
"millisShardAvg" : <num>,
"numQueries" : <num>,
"numShards" : <num>,
"cursor" : "<Cursor Type and Index>",
"n" : <num>,
"nChunkSkips" : <num>,
"nYields" : <num>,
"nscanned" : <num>,
"nscannedAllPlans" : <num>,
"nscannedObjects" : <num>,
"nscannedObjectsAllPlans" : <num>,
"millis" : <num>
}
Specifies the type of cursor used in the query operation:
A boolean, that specifies if the index used in this query is a multikey index on a field that holds an array.
Specifies the number of documents that match the query selection criteria.
Specifies the total number of documents scanned during the query. The nscannedObjects may be lower than nscanned, such as if the index is a covered index.
Specifies the total number of documents or index entries scanned during the database operation. You want n and nscanned to be close in value as possible. The nscanned value may be higher than the nscannedObjects value, such as if the index is a covered index.
New in version 2.2.
Specifies the total number of documents scanned for all query plans during the database operation.
New in version 2.2.
Specifies the total number of documents or index entries scanned for all query plans during the database operation.
New in version 2.2.
scanAndOrder is a boolean value that returns true when the query cannot use the index for returning sorted results.
When false MongoDB must sort documents retrieving the documents using either an index cursor or by cursor that scans the entire collection.
indexOnly is a boolean value that returns true when the query is covered by the index. In covered queries, the index contains all data that MongoDB needs to fulfill the query.
Specifies the number of times this query yielded the read lock to allow waiting writes execute.
Specifies the number of documents skipped because of active chunk migrations in a sharded system. Typically this will be zero. A number greater than zero is ok, but indicates a little bit of inefficiency.
Specifies the number of milliseconds to complete the query.
Specifies the lower and upper index key bounds. This field resembles one of the following:
"indexBounds" : {
"start" : { <index key1> : <value>, ... },
"end" : { <index key1> : <value>, ... }
},
"indexBounds" : { "<field>" : [ [ <lower bound>, <upper bound> ] ],
...
}
Specifies the list of plans the query optimizer runs in order to select the index for the query. Displays only when the <verbose> parameter to explain() is true or 1.
New in version 2.2.
Specifies the previous plan selected by the query optimizer for the query. Displays only when the <verbose> parameter to explain() is true or 1.
New in version 2.2.
Specifies the MongoDB server.
Contains the Core Explain Output information for each clause of the $or expression. classes is only included when the clauses in the $or expression use indexes.
Describes the access pattern for shards. The value is:
Specifies the shards accessed during the query and individual Core Explain Output for each shard.
Specifies the total time in milliseconds for the query to run on the shards.
Specifies the average time in millisecond for the query to run on each shard.
Specifies the total number of queries executed.
Specifies the total number of shards queried.
MongoDB will return one of the following codes and statuses when exiting. Use this guide to interpret logs and when troubleshooting issues with mongod and mongos instances.
Returned by MongoDB applications upon successful exit.
The specified options are in error or are incompatible with other options.
Returned by mongod if there is a mismatch between hostnames specified on the command line and in the local.sources collection. mongod may also return this status if oplog collection in the local database is not readable.
The version of the database is different from the version supported by the mongod (or mongod.exe) instance. The instance exits cleanly. Restart mongod with the --upgrade option to upgrade the database to the version supported by this mongod instance.
Returned by the mongod.exe process on Windows when it receives a Control-C, Close, Break or Shutdown event.
Returned by MongoDB applications which encounter an unrecoverable error, an uncaught exception or uncaught signal. The system exits without performing a clean shut down.
Message: ERROR: wsastartup failed <reason>
Returned by MongoDB applications on Windows following an error in the WSAStartup function.
Message: NT Service Error
Returned by MongoDB applications for Windows due to failures installing, starting or removing the NT Service for the application.
Returned when a MongoDB application cannot open a file or cannot obtain a lock on a file.
MongoDB applications exit cleanly following a large clock skew (32768 milliseconds) event.
mongod exits cleanly if the server socket closes. The server socket is on port 27017 by default, or as specified to the --port run-time option.
Returned by mongod.exe or mongos.exe on Windows when either receives a shutdown message from the Windows Service Control Manager.
Returned by mongod when the process throws an uncaught exception.
The config database supports sharded cluster operation. See the Sharding section of this manual for full documentation of sharded clusters.
To access a the config database, connect to a mongos instance in a sharded cluster, and issue the following command:
use config
You can return a list of the databases, by issuing the following command:
show collections
The changelog collection stores a document for each change to the metadata of a sharded collection.
Example
The following example displays a single record of a chunk split from a config.changelog <changelog>` collection:
{
"_id" : "<hostname>-<timestamp>-<increment>",
"server" : "<hostname>:<port>",
"clientAddr" : "127.0.0.1:63381",
"time" : ISODate("2012-12-11T14:09:21.039Z"),
"what" : "split",
"ns" : "<database>.<collection>",
"details" : {
"before" : {
"min" : {
"<database>" : { $minKey : 1 }
},
"max" : {
"<database>" : { $maxKey : 1 }
},
"lastmod" : Timestamp(1000, 0),
"lastmodEpoch" : ObjectId("000000000000000000000000")
},
"left" : {
"min" : {
"<database>" : { $minKey : 1 }
},
"max" : {
"<database>" : "<value>"
},
"lastmod" : Timestamp(1000, 1),
"lastmodEpoch" : ObjectId(<...>)
},
"right" : {
"min" : {
"<database>" : "<value>"
},
"max" : {
"<database>" : { $maxKey : 1 }
},
"lastmod" : Timestamp(1000, 2),
"lastmodEpoch" : ObjectId("<...>")
}
}
}
Each document in the changelog collection contains the following fields:
The value of changelog._id is: <hostname>-<timestamp>-<increment>.
The hostname of the server that holds this data.
A string that holds the address of the client, a mongos instance that initiates this change.
Reflects the type of change recorded. Possible values are:
Namespace where the change occured.
The chunks collection stores a document for each chunk in the cluster. Consider the following example of a document for a chunk named records.pets-animal_\"cat\":
{
"_id" : "mydb.foo-a_\"cat\"",
"lastmod" : Timestamp(1000, 3),
"lastmodEpoch" : ObjectId("5078407bd58b175c5c225fdc"),
"ns" : "mydb.foo",
"min" : {
"animal" : "cat"
},
"max" : {
"animal" : "dog"
},
"shard" : "shard0004"
}
These documents store the range of values for the shard key that describe the chunk in the min and max fields. Additionally the shard field identifies the shard in the cluster that “owns” the chunk.
The collections collection stores a document for each sharded collection in the cluster. Given a collection named pets in the records database, a document in the collections collection would resemble the following:
{
"_id" : "records.pets",
"lastmod" : ISODate("1970-01-16T15:00:58.107Z"),
"dropped" : false,
"key" : {
"a" : 1
},
"unique" : false,
"lastmodEpoch" : ObjectId("5078407bd58b175c5c225fdc")
}
The databases collection stores a document for each database in the cluster, and tracks if the database has sharding enabled. databases represents each database in a distinct document. When a databases have sharding enabled, the primary field holds the name of the primary shard.
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0000" }
The lockpings collection keeps track of the active components in the sharded cluster. Given a cluster with a mongos running on example.com:30000, the document in the lockpings collection would resemble:
{ "_id" : "example.com:30000:1350047994:16807", "ping" : ISODate("2012-10-12T18:32:54.892Z") }
The locks collection stores a distributed lock. This ensures that only one mongos instance can perform administrative tasks on the cluster at once. The mongos acting as balancer takes a lock by inserting a document resembling the following into the locks collection.
{
"_id" : "balancer",
"process" : "example.net:40000:1350402818:16807",
"state" : 2,
"ts" : ObjectId("507daeedf40e1879df62e5f3"),
"when" : ISODate("2012-10-16T19:01:01.593Z"),
"who" : "example.net:40000:1350402818:16807:Balancer:282475249",
"why" : "doing balance round"
}
If a mongos holds the balancer lock, the state field has a value of 2, which means that balancer is active. The when field indicates when the balancer began the current operation.
Changed in version 2.0: The value of the state field was 1 before MongoDB 2.0.
The mongos collection stores a document for each mongos instance affiliated with the cluster. mongos instances send pings to all members of the cluster every 30 seconds so the cluster can verify that the mongos is active. The ping field shows the time of the last ping. The cluster maintains this collection for reporting purposes.
The following document shows the status of the mongos running on example.com:30000.
{ "_id" : "example.com:30000", "ping" : ISODate("2012-10-12T17:08:13.538Z"), "up" : 13699, "waiting" : true }
The settings collection holds the following sharding configuration settings:
The following is an example settings collection:
{ "_id" : "chunksize", "value" : 64 }
{ "_id" : "balancer", "stopped" : false }
The shards collection represents each shard in the cluster in a separate document. If the shard is a replica set, the host field displays the name of the replica set, then a slash, then the hostname, as in the following example:
{ "_id" : "shard0000", "host" : "shard1/localhost:30000" }
The version collection holds the current metadata version number. This collection contains only one document:
To access the version collection you must use the db.getCollection() method. For example, to display the collection’s document:
mongos> db.getCollection("version").find()
{ "_id" : 1, "version" : 3 }
Note
Like all databases in MongoDB, the config database contains a system.indexes collection contains metadata for all indexes in the database for information on indexes, see Indexes.
Every mongod instance has its own local database, which stores data used in the replication process, and other instance-specific data. The local database is invisible to replication: collections in the local database are not replicated.
When running with authentication (i.e. auth), authenticating against the local database is equivalent to authenticating against the admin database. This authentication gives access to all databases.
In replication, the local database store stores internal replication data for each member of a replica set. The local database contains the following collections used for replication:
local.system.replset holds the replica set’s configuration object as its single document. To view the object’s configuration information, issue rs.conf() from the mongo shell. You can also query this collection directly.
local.oplog.rs is the capped collection that holds the oplog. You set its size at creation using the oplogSize setting. To resize the oplog after replica set initiation, use the Change the Size of the Oplog procedure. For additional information, see the Oplog Internals topic in this document and the Oplog topic in the Replication Fundamentals document.
This contains an object used internally by replica sets to track sync status.
This contains information about each member of the set.
MongoDB stores system information in collections that use the <database>.system.* namespace, which MongoDB reserves for internal use. Do not create collections that begin with system..
MongoDB also stores some additional instance-local metadata in the local database, specifically for replication purposes.
System collections include these collections stored directly in the database:
The <database>.system.namespaces collection contains information about all of the database’s collections. Additional namespace metadata exists in the database.ns files and is opaque to database users.
The <database>.system.indexes collection lists all the indexes in the database. Add and remove data from this collection via the ensureIndex() and dropIndex()
The <database>.system.profile collection stores database profiling information. For information on profiling, see Database Profiling.
The <database>.system.users collection stores credentials for users who have access to the database. For more information on this collection, see Authentication.
The <database>.system.js collection holds special JavaScript code for use in server side JavaScript. See Storing Functions Server-side for more information.
This document provides a collection of hard and soft limitations of the MongoDB system.
The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS.
Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON documents.
Each namespace, including database and collection name, must be shorter than 123 bytes.
The limitation on the number of namespaces is the size of the namespace file divided by 628.
A 16 megabyte namespace file can support approximately 24,000 namespaces. Each index also counts as a namespace.
Indexed items can be no larger than 1024 bytes. This value is the indexed content (i.e. the field value, or compound field value.)
A single collection can have no more than 64 indexes.
The names of indexes, including their namespace (i.e database and collection name) cannot be longer than 128 characters. The default index name is the concatenation of the field names and index directions.
You can explicitly specific a name to the ensureIndex() helper if the default index name is too long.
MongoDB does not support unique indexes across shards, except when the unique index contains the full shard key as a prefix of the index. In these situations MongoDB will enforce uniqueness across the full key, not a single field.
See
Enforce Unique Keys for Sharded Collections for an alternate approach.
There can be no more than 31 fields in a compound index.
Replica sets can have no more than 12 members.
Only 7 members of a replica set can have votes at any given time. See can vote Non-Voting Members for more information
MongoDB will only return sorted results on fields without an index if the sort operation uses less than 32 megabytes of memory.
The group does not work with sharding. Use mapReduce or aggregate instead.
db.eval() is incompatible with sharded collections. You may use db.eval() with un-sharded collections in a shard cluster.
$where does not permit references to the db object from the $where function. This is uncommon in un-sharded collections.
The $atomic update modifier does not work in sharded environments.
$snapshot queries do not work in sharded environments.
See
$or and 2d Geospatial Indexes.
The dot (i.e. .) character is not permissible in database names.
Database names are case sensitive even if the underlying file system is case insensitive.
Changed in version 2.2: For MongoDB instances running on Windows.
In 2.2 the following characters are not permissible in database names:
/\. "*<>:|?
See Restrictions on Database Names for Windows for more information.
New in version 2.2.
Collection names should begin with an underscore or a letter character, and cannot:
See Are there any restrictions on the names of Collections? and Restrictions on Collection Names for more information.
Field names cannot contain dots (i.e. .), dollar signs (i.e. $), or null characters. See Dollar Sign Operator Escaping for an alternate approach.
MongoDB import and export utilities (i.e. mongoimport and mongoexport) and MongoDB REST Interfaces render an approximation of MongoDB BSON documents in JSON format.
The REST interface supports three different modes for document output:
MongoDB can process of these representations in REST input.
Special representations of BSON data in JSON format make it possible to render information that have no obvious corresponding JSON. In some cases MongoDB supports multiple equivelent representations of the same type information. Consider the following table:
| BSON Data Type | Strict Mode | JavaScript Mode | mongo Shell Mode | Notes |
|---|---|---|---|---|
|
{
"$binary": "<bindata>",
"$type": "<t>"
}
|
{
"$binary": "<bindata>",
"$type": "<t>"
}
|
{
"$binary": "<bindata>",
"$type": "<t>"
}
|
<bindata> is the base64 representation of a binary string. <t> is the hexadecimal representation of a single byte that indicates the data type. |
|
{
"$date": <date>
}
|
Date( <date> )
|
Date ( <date> )
|
<date> is the JSON representation of a 64-bit signed integer for milliseconds since epoch (unsigned before version 1.9.1). |
|
{
"$timestamp":
{
"t": <t>,
"i": <i>
}
}
|
{
"$timestamp":
{
"t": <t>,
"i": <i>
}
}
|
Timestamp( <t>, <i> )
|
<t> is the JSON representation of a 32-bit unsigned integer for seconds since epoch. <i> is a 32-bit unsigned integer for the increment. |
|
{
"$regex": "<sRegex>",
"$options": "<sOptions>"
}
|
/<jRegex>/<jOptions>
|
/<jRegex>/<jOptions>
|
<sRegex> is a string of valid JSON characters. <jRegex> is a string that may contain valid JSON characters and unescaped double quote (") characters, but may not contain unescaped forward slash (/) characters. <sOptions> is a string containing the regex options represented by the letters of the alphabet. <jOptions> is a string that may contain only the characters ‘g’, ‘i’, ‘m’ and ‘s’ (added in v1.9). Because the JavaScript and mongo Shell representations support a limited range of options, any nonconforming options will be dropped when converting to this representation. |
|
{
"$oid": "<id>"
}
|
{
"$oid": "<id>"
}
|
ObjectId( "<id>" )
|
<id> is a 24-character hexadecimal string. |
|
{
"$ref": "<name>",
"$id": "<id>"
}
|
{
"$ref" : "<name>",
"$id" : "<id>"
}
|
DBRef("<name>", "<id>")
|
<name> is a string of valid JSON characters. <id> is a 24-character hexadecimal string. In the Strict and JS representations, the Strict representation for a data_oid can be used as the value. |
|
{
"$undefined": true
}
|
undefined
|
undefined
|
The representation for the JavaScript/BSON undefined type. |
The MongoDB aggregation framework provides a means to calculate aggregate values without having to use map-reduce.
See also
A member of a replica set that exists solely to vote in elections. Arbiter nodes do not replicate data.
See also
A serialization format used to store documents and make remote procedure calls in MongoDB. “BSON” is a portmanteau of the words “binary” and “JSON”. Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents. For a detailed spec, see bsonspec.org.
See also
The Data Type Fidelity section.
The set of types supported by the BSON serialization format. The following types are available:
| Type | Number |
| Double | 1 |
| String | 2 |
| Object | 3 |
| Array | 4 |
| Binary data | 5 |
| Object id | 7 |
| Boolean | 8 |
| Date | 9 |
| Null | 10 |
| Regular Expression | 11 |
| JavaScript | 13 |
| Symbol | 14 |
| JavaScript (with scope) | 15 |
| 32-bit integer | 16 |
| Timestamp | 17 |
| 64-bit integer | 18 |
| Min key | 255 |
| Max key | 127 |
A fixed-sized collection. Once they reach their fixed size, capped collections automatically overwrite their oldest entries. MongoDB’s oplog replication mechanism depends on capped collections. Developers may also use capped collections in their applications.
See also
The Capped Collections wiki page.
Collections are groupings of BSON documents. Collections do not enforce a schema, but they are otherwise mostly analogous to RDBMS tables.
The documents within a collection may not need the exact same set of fields, but typically all documents in a collection have a similar or related purpose for an application.
All collections exit within a single database. The namespace within a database for collections are flat.
See What is a namespace in MongoDB? and BSON Documents for more information.
A property that allows clients to address nodes in a system to based upon their location.
Replica sets implement data-center awareness using tagging.
See also
members[n].tags and Tag Sets for more information about tagging and replica sets.
Any MongoDB operation other than an insert, update, remove, or query. MongoDB exposes commands as queries against the special $cmd collection. For example, the implementation of count for MongoDB is a command.
See also
Command Reference for a full list of database commands in MongoDB
A tool that, when enabled, keeps a record on all long-running operations in a database’s system.profile collection. The profiler is most often used to diagnose slow queries.
See also
Refers to the location of MongoDB’s data file storage. The default dbpath is /data/db. Other common data paths include /srv/mongodb and /var/lib/mongodb.
A member of a replica set that cannot become primary and applies operations at a specified delay. This delay is useful for protecting data from human error (i.e. unintentionally deleted databases) or updates that have unforeseen effects on the production database.
See also
mongod can create a verbose log of operations with the mongod --diaglog option or through the diagLogging command. The mongod creates this log in the directory specified to mongod --dbpath. The name of the is diaglog.<time in hex>, where “<time-in-hex>” reflects the initiation time of logging as a hexadecimal string.
Warning
Setting the diagnostic level to 0 will cause mongod to stop writing data to the diagnostic log file. However, the mongod instance will continue to keep the file open, even if it is no longer writing data to the file. If you want to rename, move, or delete the diagnostic log you must cleanly shut down the mongod instance before doing so.
See also
mongod --diaglog, diaglog, and diagLogging.
MongoDB uses the dot notation to access the elements of an array and to access the fields of a subdocument.
To access an element of an array by the zero-based index position, you concatenate the array name with the dot (.) and zero-based index position:
'<array>.<index>'
To access a field of a subdocument with dot-notation, you concatenate the subdocument name with the dot (.) and the field name:
'<subdocument>.<field>'
The process of removing or “shedding” chunks from one shard to another. Administrators must drain shards before removing them from the cluster.
See also
A client implementing the communication protocol required for talking to a server. The MongoDB drivers provide language-idiomatic methods for interfacing with MongoDB.
See also
In the context of replica sets, an election is the process by which members of a replica set select primary nodes on startup and in the event of failures.
See also
In the context of the aggregation framework, expressions are the stateless transformations that operate on the data that passes through the pipeline.
See also
The process that allows one of the secondary nodes in a replica set to become primary in the event of a failure.
See also
A convention for storing large files in a MongoDB database. All of the official MongoDB drivers support this convention, as does the mongofiles program.
See also
In the context of geospatial queries, haystack indexes enhance searches by creating “bucket” of objects grouped by a second criterion. For example, you might want all geographical searches to also include the type of location being searched for. In this case, you can create a haystack index that includes a document’s position and type:
db.places.ensureIndex( { position: "geoHaystack", type: 1 } )
You can then query on position and type:
db.places.find( { position: [34.2, 33.3], type: "restaurant" } )
A member of a replica set that cannot become primary and is not advertised as part of the set in the database command isMaster, which prevents it from receiving read-only queries depending on read preference.
See also
Hidden Member, isMaster, db.isMaster, and members[n].hidden.
A sequential, binary transaction used to bring the database into a consistent state in the event of a hard shutdown. MongoDB enables journaling by default for 64-bit builds of MongoDB version 2.0 and newer. Journal files are pre-allocated and will exist as three 1GB file in the data directory. To make journal files smaller, use smallfiles.
When enabled, MongoDB writes data first to the journal and after to the core data files. MongoDB commits to the journal every 100ms and this is configurable using the journalCommitInterval runtime option.
To force mongod to commit to the journal more frequently, you can specify j:true. When a write operation with j:true pending, mongod will reduce journalCommitInterval to a third of the set value.
See also
The Journaling wiki page.
A JSON document is a collection of fields and values in a structured format. The following is a sample JSON document with two fields:
{ name: "MongoDB",
type: "database" }
A data and processing and aggregation paradigm consisting of a “map” phase that selects data, and a “reduce” phase that transforms the data. In MongoDB, you can run arbitrary aggregations over data using map-reduce.
See also
The Map Reduce wiki page for more information regarding MongoDB’s map-reduce implementation, and Aggregation Framework for another approach to data aggregation in MongoDB.
The MongoDB Shell. mongo connects to mongod and mongos instances, allowing administration, management, and testing. mongo has a JavaScript interface.
See also
The program implementing the MongoDB database server. This server typically runs as a daemon.
See also
The routing and load balancing process that acts an interface between an application and a MongoDB sharded cluster.
See also
The order in which a database stores documents on disk. Typically, the order of documents on disks reflects insertion order, except when documents move internal because of document growth due to update operations. However, Capped collections guarantee that insertion order and natural order are identical.
When you execute find() with no parameters, the database returns documents in forward natural order. When you execute find() and include sort() with a parameter of $natural:-1, the database returns documents in reverse natural order.
A keyword beginning with a $ used to express a complex query, update, or data transformation. For example, $gt is the query language’s “greater than” operator. See the Operator Reference for more information about the available operators.
See also
A capped collection that stores an ordered history of logical writes to a MongoDB database. The oplog is the basic mechanism enabling replication in MongoDB.
See also
The event that occurs when a process requests stored data (i.e. a page) from memory that the operating system has moved to disk.
See also
The series of operations in the aggregation process.
See also
A per-collection setting that changes and normalizes the way that MongoDB allocates space for each document in an effort to maximize storage reuse reduce fragmentation. This is the default for TTL Collections. See collMod and usePowerOf2Sizes for more information.
New in version 2.2.
In the context of replica sets, priority is a configurable value that helps determine which nodes in a replica set are most likely to become primary.
See also
For each query, the MongoDB query optimizer generates a query plan that matches the query to the index that produces the fastest results. The optimizer then uses the query plan each time the mongod receives the query. If a collection changes significantly, the optimizer creates a new query plan.
See also
A setting on the MongoDB drivers that determines how the clients direct read operations. Read preference affects all replica sets including shards. By default, drivers direct all reads to primary nodes for strict consistency. However, you may also direct reads to secondary nodes for eventually consistent reads.
See also
The precursor to the MongoDB replica sets.
Deprecated since version 1.6.
A cluster of MongoDB servers that implements master-slave replication and automated failover. MongoDB’s recommended replication strategy.
See also
A feature allowing multiple database servers to share the same data, thereby ensuring redundancy and facilitating load balancing. MongoDB supports two flavors of replication: master-slave replication and replica sets.
See also
replica set, sharding, Replication. and Replication Fundamentals.
The length of time between the last operation in the primary’s oplog last operation applied to a particular secondary or slave node. In general, you want to keep replication lag as small as possible.
See also
In the context of a replica set, the set name refers to an arbitrary name given to a replica set when it’s first configured. All members of a replica set must have the same name specified with the replSet setting (or --replSet option for mongod.)
See also
A single replica set that stores some portion of a sharded cluster’s total data set. See sharding.
See also
The documents in the Sharding section of manual.
The set of nodes comprising a sharded MongoDB deployment. A sharded cluster consists of three config processes, one or more replica sets, and one or more mongos routing processes.
See also
The documents in the Sharding section of manual.
A database architecture that enable horizontal scaling by splitting data into key ranges among two or more replica sets. This architecture is also known as “range-based partitioning.” See shard.
See also
The documents in the Sharding section of manual.
A number of database commands have “helper” methods in the mongo shell that provide a more concise syntax and improve the general interactive experience.
See also
Specifies whether a write operation has succeeded. Write concern allows your application to detect insertion errors or unavailable mongod instances. For replica sets, you can configure write concern to confirm replication to a specified number of members.
See also
Write Concern, Write Operations, and Write Concern for Replica Sets.
See also
The Index may provide useful insight into the reference material in this manual.
Always install the latest, stable version of MongoDB. See the following release notes for an account of the changes in major versions. Release notes also include instructions for upgrade.
Current stable release (v2.2-series):
See the full index of this page for a complete list of changes included in 2.2.
MongoDB 2.2 is a production release series and succeeds the 2.0 production release series.
MongoDB 2.0 data files are compatible with 2.2-series binaries without any special migration process. However, always perform the upgrade process for replica sets and sharded clusters using the procedures that follow.
Always upgrade to the latest point release in the 2.2 point release. Currently the latest release of MongoDB is 2.2.2.
mongod, 2.2 is a drop-in replacement for 2.0 and 1.8.
Check your driver documentation for information regarding required compatibility upgrades, and always run the recent release of your driver.
Typically, only users running with authentication, will need to upgrade drivers before continuing with the upgrade to 2.2.
For all deployments using authentication, upgrade the drivers (i.e. client libraries), before upgrading the mongod instance or instances.
For all upgrades of sharded clusters:
Other than the above restrictions, 2.2 processes can interoperate with 2.0 and 1.8 tools and processes. You can safely upgrade the mongod and mongos components of a deployment one by one while the deployment is otherwise operational. Be sure to read the detailed upgrade procedures below before upgrading production systems.
| [1] | To minimize the interruption caused by election process, always upgrade the secondaries of the set first, then step down the primary, and then upgrade the primary. |
You can upgrade to 2.2 by performing a “rolling” upgrade of the set by upgrading the members individually while the other members are available to minimize downtime. Use the following procedure:
Upgrade the secondary members of the set one at a time by shutting down the mongod and replacing the 2.0 binary with the 2.2 binary. After upgrading a mongod instance, wait for the member to recover to SECONDARY state before upgrading the next instance. To check the member’s state, issue rs.status() in the mongo shell.
Use the mongo shell method rs.stepDown() to step down the primary to allow the normal failover procedure. rs.stepDown() expedites the failover procedure and is preferable to shutting down the primary directly.
Once the primary has stepped down and another member has assumed PRIMARY state, as observed in the output of rs.status(), shut down the previous primary and replace mongod binary with the 2.2 binary and start the new process.
Note
Replica set failover is not instant but will render the set unavailable to read or accept writes until the failover process completes. Typically this takes 10 seconds or more. You may wish to plan the upgrade during a predefined maintenance window.
Use the following procedure to upgrade a sharded cluster:
Note
Balancing is not currently supported in mixed 2.0.x and 2.2.0 deployments. Thus you will want to reach a consistent version for all shards within a reasonable period of time, e.g. same-day. See SERVER-6902 for more information.
The aggregation framework makes it possible to do aggregation operations without needing to use map-reduce. The aggregate command exposes the aggregation framework, and the db.collection.aggregate() helper in the mongo shell provides an interface to these operations. Consider the following resources for background on the aggregation framework and its use:
TTL collections remove expired data from a collection, using a special index and a background thread that deletes expired documents every minute. These collections are useful as an alternative to capped collections in some cases, such as for data warehousing and caching cases, including: machine generated event data, logs, and session information that needs to persist in a database for only a limited period of time.
For more information, see the Expire Data from Collections by Setting TTL tutorial.
MongoDB 2.2 increases the server’s capacity for concurrent operations with the following improvements:
To reflect these changes, MongoDB now provides changed and improved reporting for concurrency and use, see locks and recordStats in server status and see current operation output, db.currentOp(), mongotop, and mongostat.
MongoDB 2.2 adds additional support for geographic distribution or other custom partitioning for sharded collections in clusters. By using this “tag aware” sharding, you can automatically ensure that data in a sharded database system is always on specific shards. For example, with tag aware sharding, you can ensure that data is closest to the application servers that use that data most frequently.
Shard tagging controls data location, and is complementary but separate from replica set tagging, which controls read preference and write concern. For example, shard tagging can pin all “USA” data to one or more logical shards, while replica set tagging can control which mongod instances (e.g. “production” or “reporting”) the application uses to service requests.
See the documentation for the following helpers in the mongo shell that support tagged sharding configuration:
Also, see the wiki page for tag aware sharding.
All MongoDB clients and drivers now support full read preferences, including consistent support for a full range of read preference modes and tag sets. This support extends to the mongos and applies identically to single replica sets and to the replica sets for each shard in a sharded cluster.
Additional read preference support now exists in the mongo shell using the readPref() cursor method.
MongoDB 2.2 provides more reliable and robust support for authentication clients, including drivers and mongos instances.
If your cluster runs with authentication:
In version 2.2, for upsert operations, findAndModify commands will now return the following output:
{'ok': 1.0, 'value': null}
In the mongo shell, findAndModify operations running as upserts will only output a null value.
Previously, in version 2.0 these operations would return an empty document, e.g. { }.
See: SERVER-6226 for more information.
If you use the mongodump tool from the 2.2 distribution to create a dump of a database, you may only restore that dump to a 2.2 database.
See: SERVER-6961 for more information.
In version 2.2, the ObjectId.toString() method returns the string representation of the ObjectId() object and has the format ObjectId("...").
Consider the following example that calls the toString() method on the ObjectId("507c7f79bcf86cd7994f6c0e") object:
ObjectId("507c7f79bcf86cd7994f6c0e").toString()
The method now returns the string ObjectId("507c7f79bcf86cd7994f6c0e").
Previously, in version 2.0, the method would return the hexadecimal string 507c7f79bcf86cd7994f6c0e.
If compatibility between versions 2.0 and 2.2 is required, use ObjectId().str, which holds the hexadecimal string value in both versions.
In version 2.2, the ObjectId.valueOf() method returns the value of the ObjectId() object as a lowercase hexadecimal string.
Consider the following example that calls the valueOf() method on the ObjectId("507c7f79bcf86cd7994f6c0e") object:
ObjectId("507c7f79bcf86cd7994f6c0e").valueOf()
The method now returns the hexadecimal string 507c7f79bcf86cd7994f6c0e.
Previously, in version 2.0, the method would return the object ObjectId("507c7f79bcf86cd7994f6c0e").
If compatibility between versions 2.0 and 2.2 is required, use ObjectId().str attribute, which holds the hexadecimal string value in both versions.
In version 2.2, collection names cannot:
This change does not affect collections created with now illegal names in earlier versions of MongoDB.
These new restrictions are in addition to the existing restrictions on collection names which are:
Collections names may have any other valid UTF-8 string.
See the SERVER-4442 and the Are there any restrictions on the names of Collections? FAQ item.
Database names running on Windows can no longer contain the following characters:
/\. "*<>:|?
The names of the data files include the database name. If you attempt to upgrade a database instance with one or more of these characters, mongod will refuse to start.
Change the name of these databases before upgrading. See SERVER-4584 and SERVER-6729 for more information.
All capped collections now have an _id field by default, if they exist outside of the local database, and now have indexes on the _id field. This change only affects capped collections created with 2.2 instances and does not affect existing capped collections.
See: SERVER-5516 for more information.
The $elemMatch operator allows applications to narrow the data returned from queries so that the query operation will only return the first matching element in an array. See the $elemMatch (projection) documentation and the SERVER-2238 and SERVER-828 issues for more information.
As of 2.2, MongoDB does not support Windows XP. Please upgrade to a more recent version of Windows to use the latest releases of MongoDB. See SERVER-5648 for more information.
You may now run mongos.exe instances as a Windows Service. See the mongos.exe reference and MongoDB as a Windows Service and SERVER-1589 for more information.
MongoDB for Windows now supports log rotation by way of the logRotate database command. See SERVER-2612 for more information.
Labeled “2008+” on the Downloads Page, this build for 64-bit versions of Windows Server 2008 R2 and for Windows 7 or newer, offers increased performance over the standard 64-bit Windows build of MongoDB. See SERVER-3844 for more information.
When you specify the --collection option to mongodump, mongodump will now backup the definitions for all indexes that exist on the source database. When you attempt to restore this backup with mongorestore, the target mongod will rebuild all indexes. See SERVER-808 for more information.
mongorestore now includes the --noIndexRestore option to provide the preceding behavior. Use --noIndexRestore to prevent mongorestore from building previous indexes.
The mongooplog tool makes it possible to pull oplog entries from mongod instance and apply them to another mongod instance. You can use mongooplog to achieve point-in-time backup of a MongoDB data set. See the SERVER-3873 case and the mongooplog documentation.
mongotop and mongostat now contain support for username/password authentication. See SERVER-3875 and SERVER-3871 for more information regarding this change. Also consider the documentation of the following options for additional information:
mongoimport now provides an option to halt the import if the operation encounters an error, such as a network interruption, a duplicate key exception, or a write error. The --stopOnError option will produce an error rather than silently continue importing data. See SERVER-3937 for more information.
In mongorestore, the --w option provides support for configurable write concern.
You can now run mongodump when connected to a secondary member of a replica set. See SERVER-3854 for more information.
Previously, mongoimport would only import documents that were less than 4 megabytes in size. This issue is now corrected, and you may use mongoimport to import documents that are at least 16 megabytes ins size. See SERVER-4593 for more information.
MongoDB extended JSON now includes a new Timestamp() type to represent the Timestamp type that MongoDB uses for timestamps in the oplog among other contexts.
This permits tools like mongooplog and mongodump to query for specific timestamps. Consider the following mongodump operation:
mongodump --db local --collection oplog.rs --query '{"ts":{"$gt":{"$timestamp" : {"t": 1344969612000, "i": 1 }}}}' --out oplog-dump
See SERVER-3483 for more information.
2.2 includes a number of changes that improve the overall quality and consistency of the user interface for the mongo shell:
The db.loadServerScripts() loads the contents of the current database’s system.js collection into the current mongo shell session. See SERVER-1651 for more information.
If you pass an array of documents to the insert() method, the mongo shell will now perform a bulk insert operation. See SERVER-3819 and SERVER-2395 for more information.
See the SERVER-2957 case and the documentation of the syslog run-time option or the mongod --syslog and mongos --syslog command line-options.
Added the touch command to read the data and/or indexes from a collection into memory. See: SERVER-2023 and touch for more information.
indexCounters now report actual counters that reflect index use and state. In previous versions, these data were sampled. See SERVER-5784 and indexCounters for more information.
See the documentation of the compact and the SERVER-4018 issue for more information.
The Boost library, version 1.49, is now embedded in the MongoDB code base.
If you want to build MongoDB binaries using system Boost libraries, you can pass scons using the --use-system-boost flag, as follows:
scons --use-system-boost
When building MongoDB, you can also pass scons a flag to compile MongoDB using only system libraries rather than the included versions of the libraries. For example:
scons --use-system-all
See the SERVER-3829 and SERVER-5172 issues for more information.
To improve performance, MongoDB 2.2 uses the TCMalloc memory allocator from Google Perftools. For more information about this change see the SERVER-188 and SERVER-4683. For more information about TCMalloc, see the documentation of TCMalloc itself.
When secondary members of a replica set fall behind in replication, mongod now provides better reporting in the log. This makes it possible to track replication in general and identify what process may produce errors or halt replication. See SERVER-3575 for more information.
The new replSetSyncFrom command and new rs.syncFrom() helper in the mongo shell make it possible for you to manually configure from which member of the set a replica will poll oplog entries. Use these commands to override the default selection logic if needed. Always exercise caution with replSetSyncFrom when overriding the default behavior.
To prevent inconsistency between members of replica sets, if the member of a replica set has members[n].buildIndexes set to true, other members of the replica set will not sync from this member, unless they also have members[n].buildIndexes set to true. See SERVER-4160 for more information.
By default, when replicating options, secondaries will pre-fetch Indexes associated with a query to improve replication throughput in most cases. The replIndexPrefetch setting and --replIndexPrefetch option allow administrators to disable this feature or allow the mongod to pre-fetch only the index on the _id field. See SERVER-6718 for more information.
In 2.2 Map Reduce received the following improvements:
If your shard key uses the prefix of an existing index, then you do not need to maintain a separate index for your shard key in addition to your existing index. This index, however, cannot be a multi-key index. See the “Shard Key Indexes” documentation and SERVER-1506 for more information.
The migration thresholds have changed in 2.2 to permit more even distribution of chunks in collections that have smaller quantities of data. See the Migration Thresholds documentation for more information.
Added License notice for Google Perftools (TCMalloc Utility). See the License Notice and the SERVER-4683 for more information.
See Changes in MongoDB 2.2 for an overview of all changes in 2.2.
Previous stable releases:
See the full index of this page for a complete list of changes included in 2.0.
Although the major version number has changed, MongoDB 2.0 is a standard, incremental production release and works as a drop-in replacement for MongoDB 1.8.
Read through all release notes before upgrading, and ensure that no changes will affect your deployment.
If you create new indexes in 2.0, then downgrading to 1.8 is possible but you must reindex the new collections.
mongoimport and mongoexport now correctly adhere to the CSV spec for handling CSV input/output. This may break existing import/export workflows that relied on the previous behavior. For more information see SERVER-1097.
Journaling is enabled by default in 2.0 for 64-bit builds. If you still prefer to run without journaling, start mongod with the --nojournal run-time option. Otherwise, MongoDB creates journal files during startup. The first time you start mongod with journaling, you will see a delay the mongod creates new files. In addition, you may see reduced write throughput.
2.0 mongod instances are interoperable with 1.8 mongod instances; however, for best results, upgrade your deployments using the following procedures:
Upgrade the secondary members of the set one at a time by shutting down the mongod and replacing the 1.8 binary with the 2.0.x binary from the MongoDB Download Page.
To avoid losing the last few updates on failover you can temporarily halt your application (failover should take less than 10 seconds), or you can set write concern in your application code to confirm that each update reaches multiple servers.
Use the rs.stepDown() to step down the primary to allow the normal failover procedure.
rs.stepDown() and replSetStepDown provide for shorter and more consistent failover procedures than simply shutting down the primary directly.
When the primary has stepped down, shut down its instance and upgrade by replacing the mongod binary with the 2.0.x binary.
A compact command is now available for compacting a single collection and its indexes. Previously, the only way to compact was to repair the entire database.
When going to disk, the server will yield the write lock when writing data that is not likely to be in memory. The initial implementation of this feature now exists:
See SERVER-2563 for more information.
The specific operations yield in 2.0 are:
MongoDB 2.0 reduces the default stack size. This change can reduce total memory usage when there are many (e.g., 1000+) client connections, as there is a thread per connection. While portions of a thread’s stack can be swapped out if unused, some operating systems do this slowly enough that it might be an issue. The default stack size is lesser of the system setting or 1MB.
v2.0 includes significant improvements to the index structures. Indexes are often 25% smaller and 25% faster (depends on the use case). When upgrading from previous versions, the benefits of the new index type are realized only if you create a new index or re-index an old one.
Dates are now signed, and the max index key size has increased slightly from 819 to 1024 bytes.
All operations that create a new index will result in a 2.0 index by default. For example:
To convert all indexes for a given collection to the 2.0 type, invoke the compact command.
Once you create new indexes, downgrading to 1.8.x will require a re-index of any indexes created using 2.0. See Build Old Style Indexes.
Applications can now use authentication with sharded clusters.
Each replica set member can now have a priority value consisting of a floating-point from 0 to 1000, inclusive. Priorities let you control which member of the set you prefer to have as primary the member with the highest priority that can see a majority of the set will be elected primary.
For example, suppose you have a replica set with three members, A, B, and C, and suppose that their priorities are set as follows:
During normal operation, the set will always chose B as primary. If B becomes unavailable, the set will elect A as primary.
For more information, see the Member Priority documentation.
You can now “tag” replica set members to indicate their location. You can use these tags to design custom write rules across data centers, racks, specific servers, or any other architecture choice.
For example, an administrator can define rules such as “very important write” or customerData or “audit-trail” to replicate to certain servers, racks, data centers, etc. Then in the application code, the developer would say:
db.foo.insert(doc, {w : "very important write"})
which would succeed if it fulfilled the conditions the DBA defined for “very important write”.
For more information, see Tagging.
Drivers may also support tag-aware reads. Instead of specifying slaveOk, you specify slaveOk with tags indicating which data-centers to read from. For details, see the Drivers documentation.
You can also set w to majority to ensure that the write propagates to a majority of nodes, effectively committing it. The value for “majority” will automatically adjust as you add or remove nodes from the set.
For more information, see Write Concern.
If the majority of servers in a set has been permanently lost, you can now force a reconfiguration of the set to bring it back online.
For more information see Reconfigure a Replica Set with Unavailable Members.
To minimize time without a primary, the rs.stepDown() method will now fail if the primary does not see a secondary within 10 seconds of its latest optime. You can force the primary to step down anyway, but by default it will return an error message.
See also Force a Member to Become Primary.
When you call the shutdown command, the primary will refuse to shut down unless there is a secondary whose optime is within 10 seconds of the primary. If such a secondary isn’t available, the primary will step down and wait up to a minute for the secondary to be fully caught up before shutting down.
Note that to get this behavior, you must issue the shutdown command explicitly; sending a signal to the process will not trigger this behavior.
You can also force the primary to shut down, even without an up-to-date secondary available.
Indexing is now supported on documents which have multiple location objects, embedded either inline or in nested sub-documents. Additional command options are also supported, allowing results to return with not only distance but the location used to generate the distance.
For more information, see Multi-location Documents.
Set the continueOnError option for bulk inserts, in the driver, so that bulk insert will continue to insert any remaining documents even if an insert fails, as is the case with duplicate key exceptions or network interruptions. The getLastError command will report whether any inserts have failed, not just the last one. If multiple errors occur, the client will only receive the most recent getLastError results.
See OP_INSERT.
Using the new sharded flag, it is possible to send the result of a map/reduce to a sharded collection. Combined with the reduce or merge flags, it is possible to keep adding data to very large collections from map/reduce jobs.
For more information, see MapReduce Output Options and mapReduce.
Map/reduce performance will benefit from the following:
Allows the dot (.) to match all characters including new lines. This is in addition to the currently supported i, m and x. See Regular Expressions and $regex.
The output of the validate command and the documents in the system.profile collection have both been enhanced to return information as BSON objects with keys for each value rather than as free-form strings.
You can define a custom prompt for the mongo shell. You can change the prompt at any time by setting the prompt variable to a string or a custom JavaScript function returning a string. For examples, see Custom Prompt.
On startup, the shell will check for a .mongorc.js file in the user’s home directory. The shell will execute this file after connecting to the database and before displaying the prompt.
If you would like the shell not to run the .mongorc.js file automatically, start the shell with --norc.
For more information, see mongo.
See the full index of this page for a complete list of changes included in 1.8.
MongoDB 1.8 is a standard, incremental production release and works as a drop-in replacement for MongoDB 1.6, except:
Read through all release notes before upgrading and ensure that no changes will affect your deployment.
1.8.x secondaries can replicate from 1.6.x primaries.
1.6.x secondaries cannot replicate from 1.8.x primaries.
Thus, to upgrade a replica set you must replace all of your secondaries first, then the primary.
For example, suppose you have a replica set with a primary, an arbiter and several secondaries. To upgrade the set, do the following:
For the arbiter:
Change your config (optional) to prevent election of a new primary.
It is possible that, when you start shutting down members of the set, a new primary will be elected. To prevent this, you can give all of the secondaries a priority of 0 before upgrading, and then change them back afterwards. To do so:
Record your current config. Run rs.config() and paste the results into a text file.
Update your config so that all secondaries have priority 0. For example:
config = rs.conf()
{
"_id" : "foo",
"version" : 3,
"members" : [
{
"_id" : 0,
"host" : "ubuntu:27017"
},
{
"_id" : 1,
"host" : "ubuntu:27018"
},
{
"_id" : 2,
"host" : "ubuntu:27019",
"arbiterOnly" : true
}
{
"_id" : 3,
"host" : "ubuntu:27020"
},
{
"_id" : 4,
"host" : "ubuntu:27021"
},
]
}
config.version++
3
rs.isMaster()
{
"setName" : "foo",
"ismaster" : false,
"secondary" : true,
"hosts" : [
"ubuntu:27017",
"ubuntu:27018"
],
"arbiters" : [
"ubuntu:27019"
],
"primary" : "ubuntu:27018",
"ok" : 1
}
// for each secondary
config.members[0].priority = 0
config.members[3].priority = 0
config.members[4].priority = 0
rs.reconfig(config)
For each secondary:
If you changed the config, change it back to its original state:
config = rs.conf()
config.version++
config.members[0].priority = 1
config.members[3].priority = 1
config.members[4].priority = 1
rs.reconfig(config)
Shut down the primary (the final 1.6 server), and then restart it with the 1.8.x binary from the MongoDB Download Page.
Turn off the balancer:
mongo <a_mongos_hostname>
use config
db.settings.update({_id:"balancer"},{$set : {stopped:true}}, true)
For each shard:
For each mongos:
For each config server:
Turn on the balancer:
use config
db.settings.update({_id:"balancer"},{$set : {stopped:false}})
If for any reason you must move back to 1.6, follow the steps above in reverse. Please be careful that you have not inserted any documents larger than 4MB while running on 1.8 (where the max size has increased to 16MB). If you have you will get errors when the server tries to read those documents.
Returning to 1.6 after using 1.8 journaling works fine, as journaling does not change anything about the data file format. Suppose you are running 1.8.x with journaling enabled and you decide to switch back to 1.6. There are two scenarios:
MongoDB now supports write-ahead journaling to facilitate fast crash recovery and durability in the storage engine. With journaling enabled, a mongod can be quickly restarted following a crash without needing to repair the collections. The aggregation framework makes it possible to do aggregation
Sparse Indexes are indexes that only include documents that contain the fields specified in the index. Documents missing the field will not appear in the index at all. This can significantly reduce index size for indexes of fields that contain only a subset of documents within a collection.
Covered Indexes enable MongoDB to answer queries entirely from the index when the query only selects fields that the index contains.
The mapReduce command supports new options that enable incrementally updating existing collections. Previously, a MapReduce job could output either to a temporary collection or to a named permanent collection, which it would overwrite with new data.
You now have several options for the output of your MapReduce jobs:
For more information, see the out field options in the mapReduce document.
Current Development series:
MongoDB 2.4 is currently in development, as part of the 2.3 development release series. While 2.3-series releases are currently available, these versions of MongoDB are for testing only, and are not for production use under any circumstances.
Important
All interfaces and functionality described in this document are subject to change before the 2.4.0 release.
This document will eventually contain the full release notes for MongoDB 2.4; during the development cycle this document will contain documentation of new features and functionality only available in the 2.3 releases.
See the full index of this page for a complete list of changes included in 2.4.
You can download the 2.3 release on the downloads page in the Development Release (Unstable) section. There are no distribution packages for development releases, but you can use the binaries provided for testing purposes. See Install MongoDB on Linux, Install MongoDB on Windows, or Install MongoDB on OS X for the basic installation process.
Note
These features are only present in the MongoDB Subscriber Edition. To download 2.3 development releases of the Subscriber Edition, use the following resources:
An improved authentication system is a core focus of the entire 2.3 cycle, as of 2.3.1, the following components of the new authentication system are available for use in MongoDB:
Note
As of 2.3.1 support for SASL/Kerberos in mongos is forthcoming. Test Kerberos with standalone mongod instances and replica sets.
Development work on this functionality is ongoing, and additional related functionality is forthcoming. To use Kerberos with MongoDB as of the current 2.3-series release, consider the following requirements:
To start mongod with support for Kerberos, use the following form:
env KRB5_KTNAME=<path to keytab file> <mongod invocation>
You must start mongod with auth or keyfile, [1] so that an actual command would resemble:
env KRB5_KTNAME=/opt/etc/mongodb.keytab \
/opt/bin/mongod --auth --dbpath /opt/data/db --logpath /opt/log/mongod.log --fork
Replace the paths as needed for your test deployment.
To use Kerberos with the mongo shell, begin by initializing a Kerberos session with kinit. Then start a 2.3.1 or greater mongo shell instance, and run the following operations to associate the current connection with the Kerberos session:
db.getMongo().saslAuthenticate( { mechanism: "GSSAPI",
principal: "<username>@<REALM>" } )
The value of the principal field must be the same principal that you initialized with kinit. Continue to gain privileges using the acquirePrivilege in an operation that resembles the following:
db.adminCommand( { acquirePrivilege: 1,
resource: <dbname>,
principal: <principalName>,
actions: [ <actionString> ] } )
Replace the <dbname> with the name of the database you want privileges, replace <principalName> with the Kerberos principal you initialized with kinit. The <actionString> list, contains the privileges you are acquiring, currently this value must be either:
The oldRead action string corresponds to the “read only” privileges in the existing authentication system, while oldWrite corresponds to the existing “read/write” privileges.
See
| [1] | keyfile implies auth, and you must use keyfile for replica sets. |
The default JavaScript engine used throughout MongoDB, for the mongo shell, mapReduce, $where, and eval is now v8.
The interpreterVersion field of the document output by db.serverBuildInfo() in the mongo shell reports which JavaScript interpreter the mongod instance is running.
The interpreterVersion() in the mongo shell reports which JavaScript interpreter this mongo shell uses.
Note
In 2.3.2, the index type for Spherical Geospatial Indexes will become 2dsphere
The 2.3 series adds a new type of geospatial index that supports improved spherical queries and GeoJSON. Create the index by specifying s2d as the value of the field in the index specification, as any of the following:
db.collection.ensureIndex( { geo: "s2d" } )
db.collection.ensureIndex( { type: 1, geo: "s2d" } )
db.collection.ensureIndex( { geo: "s2d", type: 1 } )
In the first example you create a spherical geospatial index on the field named geo, in the second example, you create a compound index where the first field is a normal index, and the index of the second field is a spherical geospatial index. Unlike 2d indexes, fields indexed using the s2d type can do not have to be the first field in a compound index.
You must store data in the fields indexed using the s2d index using the GeoJSON specification, at the moment. Support for storing points, in the form used by the existing 2d (i.e. geospatial) indexes is forthcoming. Currently, s2d indexes only support the following GeoJSON shapes:
Point, as in the following:
{ "type": "Point", "coordinates": [ 40, 5 ] }
LineString, as in the following:
{ "type": "LineString", "coordinates": [ [ 40, 5 ], [ 41, 6 ] ] }
Polygon, as in the following:
{ "type": "Polygon", "coordinates": [ [ 40, 5 ], [ 40, 6 ], [ 41, 6 ], [ 41, 5 ], [ 40, 5 ] ] }
To query s2d indexes, all current geospatial query operators with an additional $intersect operator. Currently, all queries using the s2d index must pass the query selector (e.g. $near, $intersect) a GeoJSON document. With the exception of the GeoJSON requirement, the operation of $near is the same for s2d indexes as 2d indexes.
The $intersect selects all indexed points that intersect with provided geometry. (i.e. Point, LineString, and Polygon.) You must pass $intersect a document in GeoJSON format.
db.collection.find( { $intersect: { "type": "Point", "coordinates": [ 40, 5 ] } } )
This query will select all indexed objects that intersect with the Point with the coordinates [ 40, 5 ]. MongoDB will return documents as intersecting if they have a shared edge.
If your mongod instance was building an index when it shutdown or terminated, mongod will now continue building the index when the mongod restarts. Previously, the index build had to finish building before mongod shutdown.
To disable this behavior the 2.3 series adds a new run time option, noIndexBuildRetry (or via,q --noIndexBuildRetry on the command line,) for mongod. noIndexBuildRetry prevents mongod from continuing rebuilding indexes that did were not finished building when the mongod last shut down.
By default, mongod will attempt to rebuild indexes upon start-up if mongod shuts down or stops in the middle of an index build. When enabled, run time option prevents this behavior.
To support an easy to configure and evenly distributed shard key, version 2.3 adds a new “hashed” index type that indexes based on hashed values. This section introduces and documents both the new index type and its use in sharding:
The new hashed index exists primarily to support automatically hashed shard keys. Consider the following properties of hashed indexes:
Hashed indexes must only have a single field, and cannot be compound indexes.
Fields indexed with hashed indexes must not hold arrays. Hashed indexes cannot be multikey indexes.
Hashed indexes cannot have a unique constraint.
You may create hashed indexes with the sparse property.
MongoDB can use the hashed index to support equality queries, but cannot use these indexes for range queries.
Hashed indexes offer no performance advantage over normal indexes. However, hashed indexes may be smaller than a normal index when the values of the indexed field are larger than 64 bits. [2]
it’s possible to have a hashed and non-hased index on the same field: MongoDB will use the non-hashed for range queries.
Warning
Hashed indexes round floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.3 and 2.2. To prevent collisions do not use a hashed index for floating point numbers that cannot be consistently converted to 64-bit integers (and then back to floating point.) Hashed indexes do not support floating point values larger than 253.
Create a hashed index using an operation that resembles the following:
db.records.ensureIndex( { a: "hashed" } )
This operation creates a hashed index for the records collection on the a field.
| [2] | The hash stored in the hashed index is 64 bits long. |
To shard a collection using a hashed shard key, issue an operation in the mongo shell that resembles the following:
sh.shardCollection( "records.active", { a: "hashed" } )
This operation shards the active collection in the records database, using a hash of the a field as the shard key. Consider the following properties when using a hashed shard key:
As with other kinds of shard key indexes, if your collection has data, you must create the hashed index before sharding. If your collection does not have data, sharding the collection will create the appropriate index.
The mongos will route all equality queries to a specific shard or set of shards; however, the mongos must route range queries to all shards.
When using a hashed shard key on a new collection, MongoDB automatically pre-splits the range of 64-bit hash values into chunks. By default, the initial number of chunks is equal to twice the number of shards at creation time. You can change the number of chunks created, using the numInitialChunks option, as in the following invocation of shardCollection:
db.adminCommand( { shardCollection: "test.collection",
key: { a: "hashed"},
numInitialChunks: 2001 } )
MongoDB will only pre-split chunks in a collection when sharding empty collections. MongoDB will not create chunk splits in a collection sharding collections that have data.
Warning
Avoid using hashed shard keys when the hashed field has non-integral floating point values, see hashed indexes for more information.
Other MongoDB release notes:
These release notes outline a change to all driver interfaces released in November 2012. See release notes for specific drivers for additional information.
As of the releases listed below, there are two major changes to all drivers:
All drivers will add a new top-level connection class that will increase consistency for all MongoDB client interfaces.
This change is non-backward breaking: existing connection classes will remain in all drivers for a time, and will continue to operate as expected. However, those previous connection classes are now deprecated as of these releases, and will eventually be removed from the driver interfaces.
The new top-level connection class is named MongoClient, or similar depending on how host languages handle namespacing.
The default write concern on the new MongoClient class will be to acknowledge all write operations [1]. This will allow your application to receive acknowledgment of all write operations.
See the documentation of Write Concern for more information about write concern in MongoDB.
Please migrate to the new MongoClient class expeditiously.
| [1] | The drivers will call getLastError without arguments, which is logically equivalent to the w: 1 option; however, this operation allows replica set users to override the default write concern with the getLastErrorDefaults setting in the Replica Set Configuration. |
The following driver releases will include the changes outlined in Changes. See each driver’s release notes for a full account of each release as well as other related driver-specific changes.